Tuning and configuring enterprise site search

We all know that a good enterprise site search experience not only positively fosters the brand but also compensates for any navigational and UX related deficiencies.
So, let’s talk enterprise site search and look at considerations you should have when improving the search experience for your site and your users.

Understanding how your users search is of course paramount –

Couple of Prerequisites please:

1. Partnering with business is a must. Direct-loaders (people who go straight to your site and then use your site’s search facility) need their searches to be looked after by the business, the guys who understand the products and how they are marketed and sold to the customer.

2. Deep understanding of the target site. Rely on yourself. Familiarize yourself with the site. Get to know the users. Peruse through the persona matrix.

Begin with a Search Term Analysis by extracting the analytics data associated with site search (This data may be hooked with Coremetrics/Omiture, etc, or the search engine logs/reporting platform, depending on what your site search product is).

Look for searches which retrieve zero results. These are the list of terms that users have entered and yielded zero results. Check for the following issues which may be leading to the zero results

-Misspelled Words
-Users entering terms that do not relate to the meta data
-Users searching for content which you do not have
-Technical issues with the way your search product is configured

Remedies may include
-implementing a spelling correction feature
-optimization of site content (adding content what the customers are searching for)
-optimization of meta data (usage of Beauty on the site when the customer means Makeup). The dreaded industry jargon – leave it at home.
-understanding and reacting to the language of the customer (hmm, personas anyone?)
– reconfiguring the way content is indexed (tweaking the relevance rankings, stop words. Consideration of pure content, product pages, brand pages, video, images, etc.)

Next, extract a ‘commonly searched for terms‘ report. Typically, a small number of search terms account for a high percentage of site queries. Yes, the classic 80/20 rule of long tail shows itself once again. This should be a top priority to ensure that the most popular search terms return optimal results by fine tuning them individually.

Then, extract a report which contains a list of Uncommonly Searched Terms. This report may fill a gap, and is especially crucial for a multi channel retailer where a user may search by entering a Sku # which they perhaps noted down from a TV show or a brick and mortar. This report may represent fewer searches but in the long run it can pay off to track these less popular searches over time. This report can show emerging and waning customer interests, which you can then translate into trends and implement appropriate fixes.

Embarking on improving search tuning can be a daunting task. However by taking a carefully planned approach the process can be tactically simplified.

Does the search product meet the site’s objectives? Perform a gut check and ask the following questions-

Is it fast enough?
Are the results accurate enough?
Are the displayed results on the page complete enough?
Does it integrate well with the rest of your systems?
Does it fit your business goals and deliver on your brand’s promise?
Are users finding what they need?

If the answers to these questions are not satisfactory then it makes sense to tune up the search engine.

Configure or Reconfigure your search product

Beyond the issues of server load, simultaneous users, response time there are some adjustable functions you could look into. The existing configurations may be no longer valid and should be revisited often.

Some common configuration options-

Turning on and tuning predictive search

Many users have come to expect that a search engine will offer them tips and suggestions on the results page. These suggestions could take the form of spelling corrections. If your search engine offers spelling suggestions, you may be able to modify the dictionary that supplies the corrections to add common misspellings found in your search logs. Synonyms are another form of suggestion offered by many search solutions. It is important for the search engine to recognize and interpret the language the user is accustomed to.
It is estimated that 20-30% of all search terms used on the internet contain a misspelling or a typo. If that is the case and your search engine doesn’t manage to auto correct them or the typo is on a made-up word such as a brand or product name, then your customers will get zero results, the dreaded dead-end.

The solution; find out the terms that customers are using that return zero or few results and setup a synonym to the term that does match products.

Some examples where you would use synonyms include :
– Numbers – equate 2 with ii and two
– Acronyms – to equate product-acronyms with the full product title e.g. LOTR -> Lord of the Rings
– Common misspellings e.g. cart kart
– Made-up product titles with spaces taken out e.g. (wii) motion plus = motionplus
– UK / US spellings (if not auto-corrected) e.g. color colour, metre meter

Tweaking the relevance ranking

Most search engines feature some type of relevance ranking for results. Sometimes the method or algorithm for ranking results is transparent to users, and sometimes it is purposefully obscure. Some solutions allow you to adjust the algorithm to give more weight to certain results. For example, it might make sense to tell the search engine to calculate a higher relevance score for content found in a particular channel of your site (e.g. depending on user needs, product descriptions might rank higher than press releases; product pages w/videos may rank higher than product pages w/images).

Revisit your Stop Word List

In most circumstances, there’s no point in a search engine searching for words that are so common that they appear in large numbers of product titles or descriptions. It results in too many, often irrelevant, results for the customer and wastes the search engine’s resources in matching them. Typically these words include ‘a’, ‘an’, ‘of’, ‘the’ ‘i’, ‘is’ etc but the business really ought to work out for itself which words are so common that the search engine should ignore them. This may include some words that are core to the product set being sold e.g. a entertainment retailer might get the search engine to ignore DVD, CD etc. Be wary of adopting the standard set provided by the search tool vendor. For example, a ‘stop word list’ provided by a search tool vendor may include the word ‘that’, but for a CD retailer, this would not be an appropriate stop word if it meant that it made searches for CDs by ‘Take That’ of lower relevance than results that don’t contain the word ‘that’. Try it, you’ll find there’s a band called “Take”

Enforced or Automatic Phrasing

If your customers are getting too many results for certain search phrases, of which you find that many are irrelevant, then you should instruct your search engine to only consider results where there is an exact match to the phrase (words and their order). e.g. “32gb ipod touch” – assuming you didn’t want the search engine to return 4/8/16gb ipods or ipod nanos in the results set)

Search Keyword Redirects

Sometimes it is desirable to redirect a customer to the landing page that makes the best representation of a product or product set that the customer is looking for. This gives the customer a better experience than the presentation of a product list (which may well not show the full range of potential product type matches on page 1). It is a common occurrence for customers to search for types of products e.g. Toys for 5 Year Olds, so a redirect for such types of searches would be a good idea.

Best Bet Results

Frequent searches and most important pages could have a ‘best bet’ result. You are probably familiar with seeing paid advertisements on public search engines like Google. In these cases, organizations have paid for their listing to be associated with particular keywords. The idea of tying certain results to keywords works well for site search, too—even when paid advertising is not part of the game.

The perfect site search is a moving target. The out of the box search engine is a myth. It has to be constantly nurtured and tweaked to cater to the evolving demands of the consumer related to the evolving product catalog of your site.

Microformats fast becoming a must for Digital Marketing

Digital Marketers have begun to leverage the use of microformats. The microformats standard over RDF is becoming the preferred way for integrated digital marketers to work with their IT departments to unlock the secrets of microformats and harness its benefits.

What are Microformats?

Microformats (μF) convey metadata to search engines in a format which can be crawled, parsed, indexed and semantically stored. This web based approach to semantic markup utilizing HTML/XHTML tags makes it very easy for search engines to categorize and cross reference the data, providing users a richer search experience.

Which formats are being used currently by ecommerce organizations?

The most widely used tag is the hReview tag. These are being used by vendors like Bazaar Voice and Power Reviews. UGC (User Generated Content) which is in the center of the content rave is being augmented by the hReview tag to present the’reviews data’ to search engines. Submissions of product feeds to data is then cross referenced by Google who then applies Rich Snippets to the Product link in the SERPs

How should I implement the hReviews tag?

The reviews data (aggregate as well as individual reviews data) is being widely used in an I-Frame, although my preferred method would be to integrate an API from the UGC vendor to populate the reviews data on the product page itself, which would in turn provide the target page the entire SERP juice. Of course, a canonical tag would be in order as well to avoid a duplicate content penalty.

What I am really excited about is the hMedia tag and the hProduct tag. These tags are not yet individually accepted by Google but their inclusion may be near.

How should I prepare to market Videos in the digital space?

The hMedia tag would benefit video indexing and referencing in the broad based search results. Since Google had opened up its UI in its search engine results page by opening up the left navigation to include Videos, Blogs etc, marketers have an opportunity to begin testing the hMedia tag in a sandbox environment. They should begin to plan how this metadata would programmatically populate their Media Asset Management platform for Videos. You Tube is not the only game in town. The Broadcasting community should be especially prepared to implement this integration as soon as the acceptance of the hMedia tag is announced.

The individual hProduct tag inclusion presents digital marketers yet another opportunity to harvest organic search results as part of their marketing mix. As with the hMedia tag seamless implementation of this tag in the front end HTML would be an early key to success. Bringing stakeholders from IT, Digital Marketing and Taxonomy early in the planning process would be advantageous.

What should organizations learn from microformats?

Organizations who are first to implement microformats would emerge as short term leaders in their respective industries with a near term opportunity to lead. These infrastructure improvements would easily offset some of the Paid and display spend; search engines of course would want you to believe otherwise.

What is the future of microformats?

Microformats are in an early stage of semantic communication through this markup. The search engines love the concept of cross referencing this data to provide users with a richer search experience with semantic links in the form of video, social media, blogs etc. The evidence is in the new Google UI for SERPs and a few days ago with Google News. Mashups of hProduct, hMedia, hReviews and various other combinations would likely follow.

Massive Google update being rolled out

Google has removed the sandbox for developers and word is, that the caffeine update will initially roll out to one single datacenter and then populate the other datacenters “after the holidays”.

Anyway, my expectation is that the 2 major updates to make the most impact shall be ‘real time news events‘ and ‘twitter updates’. The ‘News‘ and ‘Information‘ categories are going to be very prominent after this next update. Twitter updates will definitely be indexed fast, to show up in Google Search results.

Google’s vulnerability was evident when they could not display real time links during the ‘Iran voting crisis’ earlier this summer. As soon as Bing announced their agreement with Twitter, Google rapidly followed. It appears they are licensing their data feed instead of connecting via the regular public API. Where does Facebook stand in all this?

Additionally, there may be fewer search results displayed – I believe someone may have noticed, most searchers do not go past the first five search results, let alone the first page.

How is Bing compared to Google

There are some initial differences which are slowly becoming apparent.
Before you rush to optimize for Bing, keep in mind the Bing algorithm is still being tweaked however below are some of the early differences I have found.
1. Bing appears to give less priority to Keyword Density.
2. Bing seems to give less priority to Link Denisty as well.
3. For Bing domain age is major (this is one of the early algorithms, since their game has just begun and apparently seem to think the domains which have been around a while are to be trusted. In my opinion, this will change).
4. For Bing Anchor texts carry more weight than Google.
5. Social site links are weighted less by Bing
6. Bing weighs in heavier for the H-Tag

I am personally not going to be optimizing specifically for Bing. It is just good to be aware why the SERPs are the way they are. I am going to watch the battle between these two in the coming months and tweak accordingly. At least for now the battle is being fought with ad dollars not with sparkling search results.