Elasticsearch for logging – need architectural advice

| | August 6, 2015

I am trying to come up with an optimized architecture to store event logging messages on Elasticsearch.

Here are my specs/needs:

  • Messages are read-only; once entered, they are only queried for reporting.
  • No free text search. User will use only filters for reporting.
  • Must be able to do timestamp range queries.
  • Mainly need to filter by agent and customer interactions (in addition to other fields).
  • customers and agents belong to the same location.

So the most frequently executed query will be: get all LogItems given client_id, customer_id, and timestamp range.

Here is what a LogItem looks like:

"_source": {
    "agent_id" : 14,
    "location_id" : 2,
    "customer_id" : 5289,
    "timestamp" : 1320366520000, //Java Long millis since epoch
    "event_type" : 7,
    "screen_id" : 12
}

I need help indexing my data.

I have been reading what is an elasticsearch index? and using elasticsearch to serve events for customers to get an idea of a good indexing architecture, but I need assistance from the pros.

So here are my questions:

  1. The article suggests creating “One index per day”. How would I do range queries with that architecture? (eg: is it possible to query on index range?)

  2. Currently I’m using one big index. If I create one index per location_id, how do I use shards for further organization of my records?

  3. Given the specs above, is there a better architecture you can suggest?

  4. What fields should I filter with vs query with?

EDIT: Here’s a sample query run from my app:

{
  "query" : {
    "bool" : {
      "must" : [ {
        "term" : {
          "agent_id" : 6
        }
      }, {
        "range" : {
          "timestamp" : {
            "from" : 1380610800000,
            "to" : 1381301940000,
            "include_lower" : true,
            "include_upper" : true
          }
        }
      }, {
        "terms" : {
          "event_type" : [ 4, 7, 11 ]
        }
      } ]
    }
  },
  "filter" : {
    "term" : {
      "customer_id" : 56241
    }
  }
}

2 Responses to “Elasticsearch for logging – need architectural advice”

  1. Jilles van Gurp on November 30, -0001 @ 12:00 AM

    Take a good look at logstash (and kibana). They are all about solving this problem. If you decide to roll your own architecture for this, you might copy some of their design.

  2. You can definitely search on multiple indices. You can use wildcards or a comma-separated list of indices for instance, but keep in mind that index names are strings, not dates.

    Shards are not for organizing your data but to distribute it and eventually scale out. How you do that is driven by your data and what you do with it. Have a look at this talk: http://vimeo.com/44716955 .

    Regarding your question about filters VS queries, have a look at this other question.

Leave a Reply