Open Source Alternative for GSA


gsa-2

Google Search Appliance (GSA) is Google’s search solution for the Businesses on their private data stored in the various formats. Most of the Businesses need the crawl and search capability of Google (or something similar) for the quicker access to the private data. In the absence of these features, the organization end up wasting a lot of time in finding the relevant document or they end up recreating an already existing data/document.

GSA is not an Open Source solution and depending on your need, it does cost you significant money. This is where we felt a need to have an alternate solution. A solution which is implemented with Open Source Solutions and offers similar capabilities/features as  GSA. Since fast, accurate and controlled search is the key criteria, we decided to make use of one of the most popular open source search engine, ElasticSearch.

ElasticSearch is an Open Source Search & Analytics Engine built on top of the Apache Lucene. It is mainly focused on document storage and retrieval, searching and sorting of documents. It was designed to be used in distributed environments by providing flexibility and scalability.

As a part of this article, I am listing the most popular features of GSA and I will walk you through the implementation of Spell Checker capability of GSA using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well.

Major GSA functionalities

Following are the major GSA features, which business use for different reasons. 

  • Spell Checker
  • Self-learning scorer
  • Highlight query terms
  • Dynamic navigation
  • Query Suggestions
  • Query Suggestions Blacklist
  • Synonyms
  • Related Queries
  • Collecting metrics
  • Advanced Search
  • Sorting by metadata
  • Autocomplete
  • Wildcard search

In this article, we will focus on Spell Checker!

Problem Statement

We have an e-commerce application where people will come and search for the products. There is a possibility that people may type the wrong word while searching. To handle this, the application should be smart enough to suggest the proper spellings for the requested search term.

Prerequisites

  • Proficient in J2SE, J2EE
  • Proficient in ElasticSearch concepts
  • GSA functionality understanding

Spell Checker implementation using ElasticSearch

As part of this feature implementation, ElasticSearch should check the spelling of search queries and offer spelling suggestions to Users.

The Spell Checker should use the ElasticSearch document’s data to make spelling suggestions. Spelling suggestions should be derived from ElasticSearch index documents dynamically based on the search query.

A single spelling suggestion is returned with the results for queries when the Spell Checker detects a possible spelling suggestion. Spelling suggestions are not automatically enabled by default, we need to make certain changes in ElasticSearch index.

Setup

Create ES Index Settings with Spell Checker Analyzer. We can query the Spell Checker analyzer for spelling suggestions in ES Index.

PUT ecommerce_parts {

“settings”: {

“index”: {

“analysis”: {

“filter”: {

“stemmer”: {

“type”: “stemmer”,

“language”: “english”

},

“stopwords”: {

“type”: “stop”,

“stopwords”: [

“_english_”

]

}

},

“analyzer”: {

“SpellChecker”: {

“type”: “custom”,

“char_filter”: [

“html_strip”

],

“filter”: [

“lowercase”

],

“tokenizer”: “standard”

},

“default”: {

“type”: “custom”,

“char_filter”: [

“html_strip”

],

“filter”: [

“lowercase”,

“stopwords”,

“stemmer”

],

“tokenizer”: “standard”

}

}

},

“number_of_replicas”: “1”,

“number_of_shards”: “5”,

“refresh_interval”: “1000”

}

}

}

Create ES Index Mappings

Created one additional field (spell_checker) in ES Index to link with above SpellChecker Analyzer to copy the ES Index field’s value into this field for spelling suggestions. Add this copy statement only for fields, which are required for spelling suggestions.

PUT ecommerce_parts/_mapping/ecommerce_parts_type

{

“properties”: {

“BrandName”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Cat”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Desc”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“SubCat”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Term”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“spell_checker”: {

“type”: “string”,

“analyzer”: “SpellChecker”

}

}

}

Demonstration:

Search the documents with BrandName (‘Sprayaway’) and verify the resultsbrand

Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayaway’ search term and verify the spelling suggestions. The expectation is that there should not be any spelling suggestion because there is a Brand with ‘Sprayaway’ name.

no_spelling_suggestion

In above query result, options array gives the spelling suggestions but it is empty for ‘Sprayaway’ search term. It is expected behavior.

Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayway’ wrong Brand Name and verify the spelling suggestions. The expectation is that there should be spelling suggestion because there is no Brand with ‘Sprayway’ name.

spelling_suggestion

In above query search results, we can see the ‘sprayaway’  as a spelling suggestion because we gave wrong Brand Name (‘Sprayway’), with this exercise we can say that Spell Checker is working as expected.

Summary

As a part of this article, I have listed the most popular features of GSA. Also, I have explained one specific use case of GSA and how it can be implemented using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well. Hope, you are able to use this article to make better use of ElasticSearch.

At WalkingTree, we have been using ElasticSearch and related product suite for few years and we would love to help you take the advantage of this product.

References

Tagged with: , , , ,
Posted in Elastic Search, General

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

We Have Moved Our Blog!

We have moved our blog to our company site. Check out https://walkingtree.tech/index.php/blog for all latest blogs.

Sencha Select Partner Sencha Training Partner
Xamarin Authorized Partner
Recent Publication
%d bloggers like this: