I'm a programmer specialising in performant and scalable systems using PHP and Ruby and cooking


Published:

Getting Results for Nested Objects from Elasticsearch

In a recent development I hit a brick wall with retrieving nested objects from my database with pagination.

I could not find a reasonable way of getting paginated results from a PostgreSQL database, with a collection of nested objects joined in my result set.

Enter Elasticsearch!

Elasticsearch is:

ElasticSearch is a distributed RESTful search engine built for the cloud It has a huge number of great features which make it ideal for powering listing pages.

The Problem

Our frontend needs to list a paginated collection of "Document" records. Each Document has n "Tag" records which we also use for filtering.

Lets take a simplified example of our objects:

class Document  
{
    protected $title;
    protected $slug;
    protected $tags;
}

class Tag  
{
    protected $title;
    protected $slug;
}

In each document the "$tags" attribute is populated with an array of Tag objects.

Search Data

So you've converted your search data into a document ready to index with Elasticsearch. Here is mine:

{
  "title": "Test Document 1",
  "slug": "test-document-1",
  "content": "Some content",
  "tags": [
    {
      "name": "Cars",
      "slug": "cars"
    },
    {
      "name": "News",
      "slug": "news"
    }
  ]
}

Creating the Index

The first thing you need to do is create the index, Elasticsearch will create this for you when you add a new document, HOWEVER this not store your nested objects correctly.

We need to tell Elasticsearch that our tags object is nested. We do this using the "nested" type in our mapping settings.

Below is the settings I use to create my index:

{
  "mappings": {
    "documents": {
      "properties": {
        "title": {
          "type": "string",
          "store": "yes"
        },
        "tags": {
          "type": "nested",
          "properties": {
            "title": {
              "type": "string",
              "store": "yes"
            },
            "slug": {
              "type": "string",
              "store": "yes"
            }
          }
        }
      }
    }
  }
}

Now some people may not want to store their entire document in the index however here is an important
if you don't store the source (disabled by adding _source: { "enabled": false }) then nested objects cannot be retrieved!

So under the tags property of our document we tell Elasticsearch that it's of type "nested" so that it knows to store these records as seperate documents in its index.

Paginating Results

Of the query part of this article, the pagination is the simplest part. This is a really trivial matter of adding 2 items to the query DSL.: "size" and "from". Like LIMIT and OFFSET in SQL "size" tells Elastic search how many records to return and "from" tell it which result to start from.

Filter our results

This is where it's gets really tricky!

Working with the query DSL can be pretty daunting and believe me can be pretty complicated to get your head round. For me the best way to learn is to look at examples that fit my requirements so here is a query I created that limits the results by a required selection of tags:

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [
            {
              "range": { "published": { "gt": 0 } }
            },
            {
              "nested": {
                "path": "tags",
                "query": {
                  "filtered": {
                    "query": {
                      "match_all": {}
                    },
                    "filter": {
                      "and": [
                        {
                          "term": {
                            "tags.slug": "reviews"
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          ]
        }
      }
    }
  },
  "from": 0,
  "size": 10
}

At this point the DSL is pretty big and deeply nested, unfortunately this is required. You'll see we limit our results by saying each document must have a tag with slug "reviews"

Because we stored the entire source we can get to all the data from the _source item returned or you can specify fields to return.

More examples can be seen on my test gist.

Photo by manfrommanila