0.7.0 • Published 8 years ago

juttle-elastic-adapter v0.7.0

Weekly downloads
2
License
Apache-2.0
Repository
github
Last release
8 years ago

Juttle Elastic Adapter

Build Status

The Juttle Elastic Adapter enables reading and writing documents using Elasticsearch. It works with Elasticsearch version 1.5.2 (including AWS Elasticsearch Service) and above, such as version 2.1.1.

Examples

Read all documents stored in Elasticsearch timestamped with the last hour:

read elastic -from :1 hour ago: -to :now:

Write a document timestamped with the current time, with one field { name: "test" }, which you'll then be able to query using read elastic.

emit -limit 1 | put name="test" | write elastic

Read recent records from Elasticsearch that have field name with value test:

read elastic -last :1 hour: name = 'test'

Read recent records from Elasticsearch that contain the text hello world in any field:

read elastic -last :1 hour: 'hello world'

An end-to-end example is described here and deployed to the demo system demo.juttle.io. The Juttle Tutorial also covers using elastic adapter.

Installation

Like Juttle itself, the adapter is installed as a npm package. Both Juttle and the adapter need to be installed side-by-side:

$ npm install juttle
$ npm install juttle-elastic-adapter

Configuration

The adapter needs to be registered and configured so that it can be used from within Juttle. To do so, add the following to your ~/.juttle/config.json file:

{
    "adapters": {
        "elastic": {
            "address": "localhost",
            "port": 9200
        }
    }
}

To connect to an Elasticsearch instance elsewhere, change the address and port in this configuration.

The value for elastic can also be an array of Elasticsearch host locations. Give each one a unique id field, and read -id and write -id will use the appropriate host.

The Juttle Elastic Adapter can also make requests to Amazon Elasticsearch Service instances, which requires a little more configuration. To connect to Amazon Elasticsearch Service, an entry in the adapter config must have {"aws": true} as well as region, endpoint, access_key, and secret_key fields. access_key and secret_key can also be specified by the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY respectively.

Here's an example Juttle Elastic Adapter configuration that can connect to a local Elasticsearch instance running on port 9200 using read/write elastic -id "local" and an Amazon Elasticsearch Service at search-foo-bar.us-west-2.es.amazonaws.com using read/write elastic -id "amazon":

{
    "adapters": {
        "elastic": [
            {
                "id": "local",
                "address": "localhost",
                "port": 9200
            },
            {
                "id": "amazon",
                "aws": true,
                "endpoint": "search-foo-bar.us-west-2.es.amazonaws.com",
                "region": "us-west-2",
                "access_key": "(my access key ID)",
                "secret_key": "(my secret key)"
            }
        ]
    }
}

Schema

To read or write data, the adapter has to know the names of the indices storing that data in Elasticsearch. By default, the adapter writes points to an index called juttle and reads from all indices.

You can choose indices to read and write from with the -index option, or you can specify an index for each configured Elasticsearch instance the adapter is connected to.

For schemas that create indices at regular intervals, the adapter supports an indexInterval option. Valid values for indexInterval are day, week, month, year, and none. With day, the adapter will use indices formatted ${index}${yyyy.mm.dd}. With week, it will use ${index}${yyyy.ww}, where ww ranges from 01 to 53 numbering the weeks in a year. With month, it will use ${index}${yyyy.mm}, and with year, it will use ${index}${yyyy}. With none, the default, it will use just one index entirely specified by index. When using indexInterval, index should be the non-date portion of each index followed by *.

Lastly, the adapter expects all documents in Elasticsearch to have a field containing a timestamp. By default, it expects this to be the @timestamp field. This is configurable with the -timeField option to read and write.

Specifics of using the default Logstash schema are described here, including handling of analyzed vs not_analyzed string fields.

Usage

Read options

In addition to the options below, read elastic supports field comparisons of form field = value, that can be combined into filter expressions using AND/OR/NOT operators, and free text search, following the Juttle filtering syntax.

NameTypeRequiredDescriptionDefault
frommomentnoselect points after this time (inclusive)none, either -from and -to or -last must be specified
tomomentnoselect points before this time (exclusive)none, either -from and -to or -last must be specified
lastdurationnoselect points within this time in the past (exclusive)none, either -from and -to or -last must be specified
idstringnoread from the configured Elasticsearch endpoint with this IDthe first endpoint in config.json
indexstringnoindex(es) to read from*
indexIntervalstringnogranularity of an index. valid options: day, week, month, year, nonenone
typestringnodocument type to read fromall types
timeFieldstringnofield containing timestamps@timestamp
idFieldstringnoif specified, the value of this field in each point emitted by read elastic will be the document ID of the corresponding Elasticsearch documentnone
optimizetrue/falsenooptional flag to disable optimized reads, see Optimizationstrue

Write options

NameTypeRequiredDescriptionDefault
idstringnowrite to the configured Elasticsearch endpoint with this IDthe first endpoint in config.json
indexstringnoindex to write tojuttle
indexIntervalstringnogranularity of an index. valid options: day week, month, year, nonenone
typestringnodocument type to write toevent
timeFieldstringnofield containing timestamps@timestamp
idFieldstringnoif specified, the value of this field on each point will be used as the document ID for the corresponding Elasticsearch document and not storednone
chunkSizenumbernobuffer points until chunkSize have been received or the program ends, then flush1024
concurrencynumbernonumber of concurrent bulk requests to make to Elasticsearch (each inserts <= chunkSize points)10

Optimizations

Whenever the elastic adapter can shape the entire Juttle flowgraph or its portion into an Elasticsearch query, it will do so, sending the execution to ES, so only the matching data will come back into Juttle runtime. The portion of the program expressed in read elastic is always executed as an ES query; the downstream Juttle processors may be optimized as well.

Fully optimized example

read elastic -last :1 hour: -index 'scratch' -type 'tag1' name = 'test'
| reduce count()

This program will form an ES query that asks it do count the documents in scratch index with document type tag1 whose field name is set to the value test, and only a single record (count) will come back from Elasticsearch.

Less optimized example

read elastic -last :1 hour: name = 'test'
| put threshold = 42
| filter value > threshold

In this case, Juttle will issue a query against ES that matches documents whose field name is set to the value test (i.e. Juttle will not read all documents from ES, only the once that match the filter expression in read elastic). However, the rest of the program that filters for values exceeding threshold will be executing in the Juttle runtime, as it isn't possible to hand off this kind of filtering to ES.

List of optimized operations

  • any filter expression or full text search as part of read elastic (note: read elastic | filter ... is not optimized)
  • head or tail
  • reduce count(), sum(), and other built-in reducers
  • reduce by fieldname (other than reduce by document type)
  • reduce -every :interval:
Optimization and nested objects

There are a few fundamental incompatibilities between Elasticsearch's model for nested object and array fields and Juttle's. This can lead to some odd results for optimized programs. For objects, an optimized reduce by some_object_field will return null as the only value for some_object_field. For arrays, an optimized reduce by some_array_field will return a separate value for some_array_field for every element in every array stored in some_array_field. For results conforming to Juttle's reduce behavior, disable optimization with read elastic -optimize false.

In case of unexpected behavior with optimized reads, add -optimize false option to read elastic to disable optimizations, and kindly report the problem as a GitHub issue.

Contributing

Want to contribute? Awesome! Don’t hesitate to file an issue or open a pull request. See the common contributing guidelines for project Juttle.