Elasticsearch – commands

Features of Amazon Elasticsearch Service

Amazon ES provides the following features:

  • Multiple configurations of CPU, memory, and storage capacity, known as instance types
  • Storage volumes for your data using Amazon Elastic Block Store (Amazon EBS), known as Amazon EBS volumes
  • Multiple geographical locations for your resources, known as regions and Availability Zones
  • Cluster node allocation across two Availability Zones in the same region, known as zone awareness
  • Security with AWS Identity and Access Management (IAM) access control
  • Dedicated master nodes to improve cluster stability
  • Domain snapshots to back up and restore Amazon ES domains and replicate domains across Availability Zones
  • Kibana for data visualization
  • Integration with Amazon CloudWatch for monitoring Amazon ES domain metrics
  • Integration with AWS CloudTrail for auditing configuration API calls to Amazon ES domains
  • Integration with Amazon S3, Amazon Kinesis, and Amazon DynamoDB for loading streaming data into Amazon ES

To setup single node elasticsearch cluster: (ec2)

https://www.elastic.co/guide/en/elasticsearch/reference/current/_executing_searches.html

1. Add the YUM repo for package installation

vi  /etc/yum.repos.d/elasticsearch.repo 
[elasticsearch-1.4]
name=Elasticsearch repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

 

2.Install the package

sudo yum install elasticsearch

3. Start the service

sudo service elasticsearch start
sudo chkconfig elasticsearch on

 

What is REST API?
REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) 
It relies on a stateless, client-server, cacheable communications protocol
 -- and in virtually all cases, the HTTP protocol is used.REST is an architecture style for designing networked applications.
Example format:
curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>

1. Check the elasticsearch cluster healh:

# curl 'localhost:9200/_cat/health?v'
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign 
1469812766 17:19:26  elasticsearch green           1         1      0   0    0    0        0 

2. Get the list of nodes
#curl 'localhost:9200/_cat/nodes?v'host ip heap.percent ram.percent load node.role master name 
ip-172-31-14-63 172.31.14.63 3 29 0.00 d * Carnivore  

3. To list indices 
# curl 'localhost:9200/_cat/indices?v' 

4.Create an index "customer" 
# curl -XPUT 'localhost:9200/customer?pretty' # curl 'localhost:9200/_cat/indices?v'

5. Let’s index a simple customer document into the customer index, "external" type, with an ID of 1 as follows:
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
  "name": "John Doe"
}'
6. To retrieve the document

curl -XGET 'localhost:9200/customer/external/1?pretty'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":
{
  "name": "John Doe"
}
}


7. To delete the index "customer"
curl -XDELETE 'localhost:9200/customer?pretty'
curl 'localhost:9200/_cat/indices?v'

8. Modifying data

Elasticsearch provides data manipulation and search capabilities in near real time. By default, you can expect a one second delay (refresh interval) from the time you index/update/delete your data until the time that it appears in your search results. This is an important distinction from other platforms like SQL wherein data is immediately available after a transaction is completed.

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
  "name": "xxxx Doe"
}'

When indexing, the ID part is optional. 
If not specified, Elasticsearch will generate a random ID and then use it to index the document. 

To replace the existing document ID by adding age filed:
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{ 
  "doc" :{ "name" : "sri", "age" : "6"}
}'
 Update with simple script:  (note: disabled by default 1.4.3 version)
curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
  "script" : "ctx._source.age += 5"
}'
 
10. To delete the document:
 curl -XDELETE 'localhost:9200/customer/external/2?pretty'


11. Batch Processing:
In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the _bulk API. 
This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as little network roundtrips as possible.
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'
to update:
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
 
12. Loading some sample dataset:

Sample DAta:
{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "bradshawmckenzie@euron.com",
    "city": "Hobucken",
    "state": "CO"
}
....
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'
13. To create template:
curl -XPUT  https://sxxxxxx.us-east-1.es.amazonaws.com/_template/template_1 -d '
{
    "template" : "te*",
    "settings" : {
        "number_of_shards" : 1
    },
    "aliases" : {
        "alias1" : {},
        "alias2" : {
            "filter" : {
                "term" : {"user" : "kimchy" }
            },
            "routing" : "kimchy"
        },
        "{index}-alias" : {} 
    }
}
14. To Query the Template:

curl -XGET https://sxxxxxxx.us-east-1.es.amazonaws.com/_template/template_1

{“template_1″:{“order”:0,”template”:”te*”,”settings”:{“index”:{“number_of_shards”:”1″}},”mappings”:{},”aliases”:{“alias2″:{“filter”:{“term”:{“user”:”kimchy”}},”index_routing”:”kimchy”,”search_routing”:”kimchy”},”{index}-alias”:{},”alias1″:{}}}}

15. To Delete the Template:

curl -XDELETE https://sxxxxxxx.us-east-1.es.amazonaws.com/_template/template_1

{“acknowledged”:true}

The Search API
Now let’s start with some simple searches. 
There are two basic ways to run searches: 
one is by sending search parameters through the REST request URI and the other by sending them through the REST request body.
The request body method allows you to be more expressive and also to define your searches in a more readable JSON format.

The REST API for search is accessible from the _search endpoint. This example returns all documents in the bank index:
curl 'localhost:9200/bank/_search?q=*&pretty'
We are searching (_search endpoint) in the bank index, and theq=* parameter instructs Elasticsearch to match all documents in the index. The pretty parameter, again, just tells Elasticsearch to return pretty-printed JSON results.

took – time in milliseconds for Elasticsearch to execute the search

timed_out – tells us if the search timed out or not

_shards – tells us how many shards were searched, as well as a count of the successful/failed searched shards

hits – search results

hits.total – total number of documents matching our search criteria

hits.hits – actual array of search results (defaults to first 10 documents)

_score - The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is.

max_score – ignore these fields for now

# curl 'localhost:9200/bank/_search?q=*&pretty'|more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3547  100  3547    0     0   304k      0 --:--:-- --:--:-- --:--:--  314k
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "account",
      "_id" : "4",
      "_score" : 1.0,
      "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F
","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"rodriquezflores@tourmania.com","city":"Eastvale","s
tate":"HI"}


or
 curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query" : {"match_all" : {}}
}'

The difference here is that instead of passing q=* in the URI, we POST a JSON-style query request body to the _search API.

Query Language: (query DSL)

Example:
{
  "query": { "match_all": {} }
}

The match_all query is simply a search for all documents in the specified index and returns 1st document.
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "size": 1
}'

Note that if size is not specified, it defaults to 10.

Example to "match_all" and returns documents 11 through 20.
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}'
This example does a match_all and sorts the results by account balance in descending order and returns the top 10 (default size) documents.
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}'
Executing searches: (only two fields on each document)

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "_source": ["email", "city"]
}'

match query:i.e. a search done against a specific field or set of fields
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "account_number": 20 } }
}'

This example returns all accounts containing the term "mill" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "address": "mill" } }
}'

bool(ean) query:
The bool query allows us to compose smaller queries into bigger queries using boolean logic.
This example composes two match queries and returns all accounts containing "mill" and "lane" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'
 
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'
 
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}'


Executing Filters:

 let say,we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'


Executing Aggregations:

Aggregations provide the ability to group and extract statistics from your data.

Example:
 Groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending:
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state"
      }
    }
  }
}'
Note that we set size=0 to not show search hits because we only want to see the aggregation results in the response.
one more example:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}'

Some more Examples:
 Count the objects in all the indexes of ES domain:
 curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/_count?pretty' -d ' { "query" : { "match_all" : {} } }'
 
 Count the objects in specific the index "test" of ES domain:
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/test/_count?pretty' -d ' { "query" : { "match_all" : {} } }'

simple search:
with no REST API
curl 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/test/_search?q=*&pretty'

With REST API (XGET) alternative request body method (instead of q=*&, we post JSON query in body)
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/test/_search?q=*&pretty' -d ' { "query" : { "match_all" :{}}}'
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/test/_search?pretty' -d ' { "query" : { "match_all" :{}}}'

curl -XPOST 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.com/movie/_search' -d ' {
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}'


curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/_search?pretty' -d ' {
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}'

Search for specific year "2010":
================================
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/bulk/_search?pretty' -d ' {
  "query" : {"match" : { "year" : 2010}}}'
  
Find movies released after 2010:
================================

curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/bulk/_search?pretty' -d ' {
"query" :  { "match_all": {} },
"filter" : {
"range" : {
"year" : {
"gte" : "2010",
"lte" : "2017"
}}}}}'

Find movies match "action" & "comedy":
================================

 curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/bulk/_search?pretty' -d ' {
  "query": {"match" : { "genres" : "Action Comedy"}}}'
  
Find movies starring Matt Damon :
================================
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/bulk/_search?pretty' -d ' {
  "query": {"match" : { "actors" : "Matt Damon"}}}'

Find movies with ratings of 7.5 or higher:
========================================
curl -XGET 'http://search-kabali-ayht7vve5jgkx5rzkdtozc42ta.us-east-1.es.amazonaws.com/movie/bulk/_search?pretty' -d ' {
"query" :  { "match_all": {} },
"filter" : {
"range" : {
"rating" : {
"gte" : "7.5"
}}}}}'



Types and Mappings:

A type in Elasticsearch represents a class of similar documents.
A type consists of a name—such as user or blogpost—and a mapping.

Mapping:
The mapping, like a database schema, describes the fields orproperties that documents of that type may have, 
the datatype of each field—such as string,integer, or date—and how those fields should be indexed and stored by Lucene.

Simple way :

Index - Like "Database"
Type  - Like "Table" in the database
Mapping - Like "Schema" of the table.

SCROLL Query:

A scroll query is used to retrieve large numbers of documents from Elasticsearch efficiently, without paying the penalty of deep pagination.

Scrolling allows us to do an initial search and to keep pulling batches of results from Elasticsearch until there are no more results left. It’s a bit like a cursor in a traditional database.

A scrolled search takes a snapshot in time. It doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old data files around, so that it can preserve its “view” on what the index looked like at the time it started.

https://www.elastic.co/guide/en/elasticsearch/guide/current/scroll.html
GET /old_index/_search?scroll=1m 
{
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  1000
}
GET /_search/scroll
{
    "scroll": "1m", 
    "scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRROzEwOTk1OmRSamJHYWM4U2E2eUIzVkMxalZidFE7MTA5OTM6ZFJqYkdhYzhTYTZ5QjNWQzFqVmJ0UTsxMTE5MDpBVUtwN2lxc1FLZV8yRGVjWlI2QUVBOzEwOTk2OmRSamJHYWM4U2E2eUIzVkMxalZidFE7MDs="
}

 

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>