Skip to content

Elasticsearch

allebaria edited this page Feb 4, 2020 · 5 revisions

What is it?

Elasticsearch is a search engine built on apache Lucene. It is an open source and developed in Java. It is a real-time distributed and analytics engine which helps in performing various kinds of search mechanism. It is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. Additionally, it supports full-text search which is completely based on documents instead of tables or schemas.

Why are we implementing it?

Some of the functionalities of our API need to be solved through a search into de database. Therefore, Elasticsearch can play an important role in carrying out fast and efficient searches. Let's see then which benefits can bring its usage in our project:

  • Possible to analyze billions of records in few seconds. By using distributed inverted indices, Elasticsearch quickly finds the best matches for your full-text searches from even very large data sets.
  • It is built to scale. Growing from a small cluster to a large cluster is almost entirely automatic and painless.
  • Elasticsearch uses JSON as the serialization format for documents. As we are also providing JSON formats in our API responses it is also an interesting point.
  • Usage of fuzzy search. A fuzzy search is good for spelling errors. You can find what you are searching for even though you have a spelling mistake.
  • We can use Kibana with Elastic Search to get statistics and metrics regarding the different searches that have been accomplished. It not only will ease the search process but also provide some KPIs.

How are we setting it in our project?

To use Elasticsearch in our Ruby on Rails API we'll need the following gems:

  • 'elasticsearch-rails': Contains various features for Ruby on Rails applications.
  • 'elasticsearch-model': contains search integration for Ruby/Rails models such as ActiveRecord::Base.

Elasticsearch runs in a separate server. By default, it will run on localhost:9092. However, given that we've had problems running it on Windows and on our AWS hosting Linux machine, we'll run it on Bonsai. Bonsai integrates powerful search functionality without having to set up or manage servers. This means that by telling rails that Elasticsearch will point now to our Bonsai URL instead of our localhost:9092, we'll be able to use it from any machine without caring in which server is Elasticsearch running.

By using Bonsai we need to have the environment variable 'BONSAI_URL' with the URL that was provided within our account. Therefore, it must be configured in our AWS machine for deployment and in our .env.local and .env.test for our development and test environments.

Implementation in our project

Models

Thanks to elasticsearch-model, we can easily tell elastic search that we'll perform searches for a specific model and its attributes (associated models as attributes as well). To do it, we include the modules that can be seen in the example below, as well as import them (require):

require 'elasticsearch/model'

class Room < ApplicationRecord
  # Elastic Search
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
end

We mentioned before that one of the reasons that make Elasticsearch so powerful is the fact that it indexes the parameters that we want to use in our search. Thus, we must index the attributes of the model we want to perform the search with. Again, 'elasticsearch-model', eases the indexation of a model attributes by doing it like in the example below:

class Room < ApplicationRecord
  # Elastic Search
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  settings do
    mappings dynamic: false do
      indexes :location, type: :geo_point
      indexes :city, type: :object do
        indexes :name
      end
      indexes :room_location_service, type: :object do
        indexes :health, type: :integer
        indexes :leisure, type: :integer
        indexes :transport, type: :integer
        indexes :food, type: :integer
        indexes :tourism, type: :integer
      end
    end
  end

  # Customize the JSON serialization for Elasticsearch
  def as_indexed_json(options={})
    as_json(include: {
      city: { only: :name },
      room_location_service: { only: [:health, :leisure, :transport, :food, :tourism] }
    }).merge location: { lat: latitude, lon: longitude }
  end
end

Inside the mappings structure, we define how ES is indexing the data we send. The as_indexed_json function is used to override what data will be send to ES for indexing and therefore to customize the JSON that will be used in ES. Also note how we can index associated models like city or room_location_service.

IMPORTANT! Whenever we make changes to our indexation process (add/remove/modify indexes) we must run the following commands in our CLI inside our project folder:

rails c
Room.import(force: true)
quit

Note that we are doing it with Room because it was the model used in the example but it must be done with any model in which indexation has changed.

Performing a search

Now that our Room model can be used to perform searches defined by its indexed attributes (i.e: search rooms with a range of values for a specific attribute) we can perform a search. To do it, we must study how an ES query is carried out. Documentation on how queries are defined can be found in https://www.elastic.co/guide/en/elasticsearch/reference/7.5/elasticsearch-intro.html and it is really well explained also with some examples. Once we know how to define an ES query, we can execute it for the room model like so: Room.search(query). To show a more complex example, in our project we could perform the following ES to search rooms inside specific bounds and city as well as using pagination (from and size):

Room.search(
 {
    query: {
      bool: {
        must: [
        {
          geo_bounding_box: {
            location: {
              top_left: {
                lat: 41.5,
                lon: 2.0
              },
              bottom_right: {
                lat: 41.3,
                lon: 2.3
              }
            }
          }
        },
        { match: { 'city.name': "Barcelona" }}
        ]
      }
    },
    sort: [],
    from: 10,
    size: 2
  }
)

The example above would return all the rooms inside the specified latitude and longitude points, that are in Barcelona city. Furthermore, it will return only two rooms corresponding to page 10.

Clone this wiki locally