1.1 Basic concepts
Elasticsearch is also a full-text search library based on Lucene, which essentially stores data. Many concepts are similar to MySQL.
Comparison relationship:
- Indexes (indices) ——————————– Databases database
- Type (type) —————————–———– Table data table
- Document (Document) ———————– Row row
- Field (Field) ———————–———––—– Columns column
Detailed description:
concept | Explanation |
Indexes (indices) | indices is the plural of index, representing many indexes, |
Type | The type is to simulate the concept of table in mysql. There can be different types of indexes under an index library, such as commodity index and order index, and their data formats are different. However, this will lead to chaos in the index library, so this concept will be removed in a future version |
Document | Save the original data in the index library. For example, each product information is a document |
Field | Properties in the document |
Mapping configuration (mappings) | Field data types, attributes, whether to index, whether to store and other characteristics |
- Index set (Indices, plural of index): logically complete index
- Shard: The parts after the data is split
- Replica (replica): replication of each shard
1.2. Create Index
1.2.1. Grammar
Elasticsearch adopts Rest style API, so its API is a http request, you can use any tool to initiate http request
Request format for index creation:
Request method: PUT
Request path: / index name
Request parameters: json format:
{ “settings”: { “number_of_shards”: 3, “number_of_replicas”: 2 } }
Settings: Index library settings
number_of_shards: number of shards
number_of_replicas: number of replicas
1.2.2. Test
We try with kibana
You can see that the index was created successfully.
1.2.3. Create with posman
Can also be created successfully, but it is not convenient to use kibana
1.3. View index settings
Get request can help us view the index information, format:
GET /index name
Alternatively, we can use * to query all index library configurations
1.4. Delete the index
Delete index using DELETE request
DELETE /index name
2.5. Mapping configuration
With the index, the next step is definitely to add data. However, the mapping must be defined before adding data.
What is mapping?
Mapping is the process of defining the document, which fields the document contains, whether these fields are saved, whether they are indexed, whether they are word segmentation, etc.
Only if the configuration is clear, Elasticsearch will help us create the index library
2.5.1. Create mapping fields
The request method is still PUT
PUT /indexname/_mapping/typename { “properties”: { “table”: { “type”: “word”, “index”: true, “store”: true, “analyzer”: “cjktoken” } } }
- Type name: It is the concept of type mentioned earlier, similar to different tables in the database.
Field name: Fill it in arbitrarily, you can specify many attributes, for example: - type: type, which can be text, long, short, date, integer, object, etc.
- index: whether to index, the default is true
- store: whether to store, the default is false
- analyzer: tokenizer, here ik_max_wordis to use ik tokenizer
Examples
Make a request:
PUT testindex/_mapping/goods { “properties”: { “title”: { “type”: “text”, “analyzer”: “ik_max_word” }, “images”: { “type”: “keyword”, “index”: “false” }, “price”: { “type”: “float” } } }
Response result:
{ “acknowledged”: true }
1.5.2. View the mapping relationship
GET /testindex/_mapping response: { “testindex”: { “mappings”: { “goods”: { “properties”: { “images”: { “type”: “keyword”, “index”: false }, “price”: { “type”: “float” }, “title”: { “type”: “text”, “analyzer”: “ik_max_word” } } } } } }
1.5.3. Detailed field attributes
1.5.3.1.type
The data types supported in Elasticsearch are very rich:
We say a few key ones:
There are two types of String types:
text: can be divided into words, can not participate in aggregation
keyword: indivisible, the data will be matched as a complete field and can be aggregated
Numerical: numerical types, divided into two categories
Basic data types: long, interger, short, byte, double, float, half_float
High precision type of floating point: scaled_floatYou need to specify a precision factor, such as 10 or 100. Elasticsearch will multiply the real value by this factor and store it, then restore it when it is taken out.
Date: Date type
Elasticsearch can format the date as a string storage, but it is recommended that we store it as a millisecond value and store it as long to save space.
1.5.3.2.index
index affects the index of the field.
- true: The field will be indexed and can be used to search. The default value is true
- false: the field will not be indexed and cannot be used for searching
The default value of index is true, which means that if you do not configure anything, all fields will be indexed.
But there are some fields that we do not want to be indexed, such as the picture information of the product, we need to manually set the index to false.
1.5.3.3.store
Whether to store the data extra.
When learning lucene and solr, we know that if the store of a field is set to false, then the value of this field will not be in the document list, and the user’s search results will not be displayed.
But in Elasticsearch, even if the store is set to false, you can search for the results.
The reason is that when Elasticsearch creates a document index, it will back up the original data in the document and save it in a _sourceproperty called . And we can _sourceselect which ones to display and which ones to not display through filtering .
If you set store to true, it will store _sourcean extra piece of data outside, which is redundant, so generally we will set store to false. In fact, the default value of store is false.
1.6. New data
1.6.1. Randomly generated id
Through POST requests, you can add data to an existing index library.
Examples:
POST /testindex/goods/ { “title”:”iphoneX”, “images”:”1,jpg”, “price”:111.00 }
response: { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_version”: 1, “result”: “created”, “_shards”: { “total”: 2, “successful”: 1, “failed”: 0 }, “created”: true }
View data through kibana:
get _search { “query”:{ “match_all”:{} } } { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_version”: 1, “_score”: 1, “_source”: { “title”: “iphoneX”, “images”: “1.jpg”, “price”: 111 } }
- _source: Source document information, all data are in it.
- _id: The unique identifier of this document is not associated with the document’s own id field
1.6.2. Custom id
If we want to specify the id when adding ourselves, we can do this:
Examples:
POST /testindex/goods/2 { “title”:”IphoneX”, “images”:”2.jpg”, “price”:222 }
The data obtained:
{ “_index”: “testindex”, “_type”: “goods”, “_id”: “2”, “_score”: 1, “_source”: { “title”: “IphoneX”, “images”: “2,jpg”, “price”: 222 } }
2.6.3. Intelligent judgment
When learning Solr, we found that when we add new data, we can only use the fields with the mapping attributes configured in advance, otherwise we will report errors.
However, there is no such requirement in Elasticsearch.
In fact Elasticsearch is very smart, you don’t need to set any mapping mapping for the index library, it can also judge the type based on the data you input, and dynamically add data mapping.
have a test:
POST /testindex/goods/3 { “title”:”IphoneX”, “images”:”3.jpg”, “price”:333, “stock”: 200 }
We have added an additional stock field.
Look at the results:
{ “_index”: “testindex”, “_type”: “goods”, “_id”: “3”, “_version”: 1, “_score”: 1, “_source”: { “title”: “IphoneX”, “images”: “3.jpg”, “price”: 333, “stock”: 200 } }
Look at the mapping relationship of the index library:
{ “testindex”: { “mappings”: { “goods”: { “properties”: { “images”: { “type”: “keyword”, “index”: false }, “price”: { “type”: “float” }, “stock”: { “type”: “long” }, “title”: { “type”: “text”, “analyzer”: “ik_max_word” } } } } } }
Both stock and saleable are successfully mapped.
1.7. Modify data
Change the request method just added to PUT, it is modified. However, the modification must specify the id,
- id corresponding document exists, modify
- id corresponding document does not exist, then add
For example, we modify the data with id 3:
PUT /testindex/goods/3 { “title”:”IphoneX”, “images”:”3.jpg”, “price”:333, “stock”: 100 }
result: { “took”: 17, “timed_out”: false, “_shards”: { “total”: 9, “successful”: 9, “skipped”: 0, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 1, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “3”, “_score”: 1, “_source”: { “title”: “IphoneX”, “images”: “3.jpg”, “price”: 333, “stock”: 100 } } ] } }
2.8. Delete data
To delete using DELETE request, similarly, you need to delete according to id:
DELETE /indexname/type/id
3. Inquiry
We query from 4 blocks:
- Basic query
- _sourcefilter
- Results filtering
- Advanced Search
- Sort
3.1. Basic query:
The query here represents a query object, which can have different query attributes
- Query type:
- For match_allexample: match, term, , rangeetc.
- Query conditions: The query conditions will be different depending on the type, and the writing method will also be different.
3.1.1 Query all (match_all)
Examples:
GET /testindex/_search { “query”:{ “match_all”: {} } }
- query: Represents the query object
- match_all: On behalf of all
result: { “took”: 2, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 1, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 1, “_source”: { “title”: “iphoneX”, “images”: “1,jpg”, “price”: 111 } } ] } }
- took: The query took time, in milliseconds
- time_out: whether to time out
- _shards: shard information
- hits: search results overview object
- total: the total number of searched
- max_score: the highest score of all results
- hits: an array of document objects in the search results, each element is a piece of searched document information
- _index: index library
- _type: document type
- _id: document id
- _score: document score
- _source: the source data of the document
3.1.2 Match query (match)
- or relationship
matchType query, the query conditions will be segmented, and then query, the relationship between multiple entries is or
GET /testindex/_search { “query”:{ “match”:{ “title”:”iphoneX” } } } result: { “took”: 26, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 2, “max_score”: 0.51623213, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 0.51623213, “_source”: { “title”: “iphoneX”, “images”: “1,jpg”, “price”: 111 } }, { “_index”: “testindex”, “_type”: “goods”, “_id”: “3”, “_score”: 0.25811607, “_source”: { “title”: “iMac”, “images”: “4.jp”, “price”: 444 } } ] } }
In the above case, not only will Xiaomi phones be queried, but also those related to Xiaomi will be queried, and orthe relationship between multiple words is . (Xiaomi mobile phone is divided into two words, Xiaomi and mobile phone, because of the or relationship, so as long as there is one of the two keywords of Xiaomi or mobile phone will be queried)
- and relationship
In some cases, we need to find more precisely, and we want this relationship to become and, we can do this:
GET /testindex/_search { “query”:{ “match”: { “title”: { “query”: “iphoneX”, “operator”: “and” } } } } result: { “took”: 26, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 0.51623213, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 0.51623213, “_source”: { “title”: “iphoneX”, “images”: “1,jpg”, “price”: 111 } } ] } }
In this example, only terms that contain both IPhone and IPad will be searched.
- between or and and?
In orthe androom a second election a little too black and white. If there are 5 query terms after the word segmentation given by the user, and want to find documents that contain only 4 of them, what should I do? The operator operator parameter is set to andonly exclude this document.
Sometimes this is what we expect, but in most application scenarios of full-text search, we want to include those documents that may be relevant, while excluding those that are less relevant. In other words, we want to be somewhere in the middle.
matchQuery support minimum_should_matchminimum matching parameters, which allows us to specify the number of terms that must be matched to represent a document is relevant. We can set it to a specific number, the more common way is to set it to %, because we can not control the number of words entered by the user when searching:
GET /testindex/_search { “query”:{ “match”:{ “title”:{ “query”:”iWatch”, “minimum_should_match”: “75%” } } } }
In this example, the search statement can be divided into 3 words. If you use the and relationship, you need to satisfy 3 words at the same time to be searched. Here we use the minimum number of brands: 75%, then it means that as long as it matches 75% of the total number of entries, here 3 * 75% is approximately equal to 2. So as long as it contains 2 entries, the conditions are met.
3.1.3 Multi-field query (multi_match)
multi_matchAnd matchsimilar, except that it can be queried in multiple fields
GET /testindex/_search { “query”:{ “multi_match”: { “query”: “IPhone”, “fields”: [ “title”, “image” ] } } }
Will take the query in the two fields title and image
3.1.4 term matching (term)
termThe query is used for exact value matching, these exact values may be numbers, time, boolean or those unsegmented strings
GET /testindex/_search { “query”:{ “term”:{ “price”:111 } } } result: { “took”: 15, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 1, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 1, “_source”: { “title”: “iphoneX”, “images”: “1,jpg”, “price”: 111 } } ] } }
3.1.5 Multi-term exact matching (terms)
termsThe query is the same as the term query, but it allows you to specify multiple values to match. If this field contains any one of the specified values, then the document meets the conditions:
GET /testindex/_search { “query”:{ “terms”:{ “price”:[111,222] } } }
3.2. Results filtering
By default, elasticsearch will _sourcereturn all the fields stored in the document in the search results .
If we only want to get some of the fields, we can add _sourcefilters
3.2.1. Directly specify fields
Examples:
GET /testindex/_search { “_source”: [“title”,”price”], “query”: { “term”: { “price”: 111 } } } Results returned: { “took”: 28, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 1, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 1, “_source”: { “price”: 111, “title”: “iphoneX” } } ] } }
In this way, there are only two fields title and price in the _source field
3.2.2. Specify includes and excludes
We can also pass:
- includes: to specify the fields you want to display
- excludes: to specify fields that you do not want to display
Both are optional.
Examples:
GET /testindex/_search { “_source”: { “includes”:[“title”,”price”] }, “query”: { “term”: { “price”: 111 } } }
The result will be the same as the following:
GET /testindex/_search { “_source”: { “excludes”: [“images”] }, “query”: { “term”: { “price”: 2699 } } }
3.3 Advanced query
3.3.1 Boolean combination (bool)
boolCombine various other queries by must(AND), must_not(NOT), should(OR)
GET /testindex/_search { “query”:{ “bool”:{ “must”: { “match”: { “title”: “IPhone” }}, “must_not”: { “match”: { “title”: “TV” }}, “should”: { “match”: { “title”: “Phone” }} } } } result: { “took”: 18, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “failed”: 0 }, “hits”: { “total”: 1, “max_score”: 0.51623213, “hits”: [ { “_index”: “testindex”, “_type”: “goods”, “_id”: “AWsS5Neq-k3yg4WVTNnG”, “_score”: 0.51623213, “_source”: { “title”: “iphoneX”, “images”: “1,jpg”, “price”: 111 } } ] } }
rangeThe query allows the following characters:
Operator | Explanation |
gt | more than the |
gte | greater or equal to |
lt | Less than |
lte | Less than or equal to |
3.3.3 Fuzzy query (fuzzy)
We add a new product:
POST /testindex/goods/4 { “title”:”applePhone”, “images”:”apple.jpg”, “price”:6899.00 }
fuzzyQueries are termfuzzy equivalent queries. It allows the user to deviate between the spelling of the search term and the actual term, but the edit distance of the deviation must not exceed 2:
GET /testindex/_search { “query”: { “fuzzy”: { “title”: “appla” } } }
The above query can also find the Apple mobile phone
We can fuzzinessspecify the allowed editing distance by:
GET /testindex/_search { “query”: { “fuzzy”: { “title”: { “value”:”appla”, “fuzziness”:1 } } } }
3.4 filter
Filter in conditional query
All queries will affect the score and ranking of the document. If we need to filter in the query results, and do not want the filter conditions to affect the score, then do not use the filter conditions as query conditions. Instead, use filter:
GET /testindex/_search { “query”:{ “bool”:{ “must”:{ “match”: { “title”: “iphoneX” }}, “filter”:{ “range”:{“price”:{“gt”:2000.00,”lt”:3800.00}} } } } }
Note: filterYou can also boolfilter the combined conditions again .
No query conditions, direct filtering
If a query has only filtering, no query conditions, and no scoring, we can use constant_scorebool query instead of only filter statement. The performance is exactly the same, but it is very helpful to improve the simplicity and clarity of the query.
GET /heima/_search { “query”:{ “constant_score”: { “filter”: { “range”:{“price”:{“gt”:2000.00,”lt”:3000.00}} } } }
3.5 Sort
3.4.1 Single field sorting
sortAllows us to sort by different fields, and by orderspecifying the sorting method
GET /testindex/_search { “query”: { “match”: { “title”: “iphoneX” } }, “sort”: [ { “price”: { “order”: “desc” } } ] }
3.4.2 Sorting multiple fields
Suppose we want to use a combination of price and _score (score) for the query, and the matching results are first sorted by price and then by relevance score:
GET /goods/_search { “query”:{ “bool”:{ “must”:{ “match”: { “title”: “iphoneX” }}, “filter”:{ “range”:{“price”:{“gt”:200000,”lt”:300000}} } } }, “sort”: [ { “price”: { “order”: “desc” }}, { “_score”: { “order”: “desc” }} ] }
4. Aggregation
Aggregation allows us to achieve extremely convenient statistics and analysis of data. E.g:
- What brand of mobile phone is the most popular?
- The average price, the highest price, the lowest price of these phones?
- How about the monthly sales of these phones?
It is more convenient to implement these statistical functions than the database sql, and the query speed is very fast, which can realize the real-time search effect.
4.1 Basic concepts
Aggregation in Elasticsearch contains multiple types, the two most commonly used, one called bucket and one called metrics:
Bucket
The role of the barrel is to group the data in some way, each set of data referred to in the ES a bucket , for example, we divide people on the basis of nationality, can get CN bucket, UAS bucket …… Jan bucket or our people are divided according to age groups: 0 10,10 20,20 30,30 40 etc.
There are many ways to divide buckets provided in Elasticsearch:
- Date Histogram Aggregation: grouped according to the date ladder, for example, if the given ladder is a week, it will be automatically divided into a group every week
- Histogram Aggregation: grouped according to numerical ladder, similar to date
- Terms Aggregation: grouped according to the content of the entry
- Range Aggregation: Range grouping of numeric values and dates, specify start and end, and then group by segments
- …
In summary, we found that bucket aggregations are only responsible for grouping data, and do not perform calculations. Therefore, another aggregation is often nested in the bucket: metrics aggregations are metrics
Metrics
After the grouping is completed, we generally perform aggregation operations on the data in the group, such as average, maximum, minimum, summation, etc. These are called in ES metrics
Some commonly used measures aggregation methods:
- Avg Aggregation: average
- Max Aggregation: find the maximum value
- Min Aggregation: Find the minimum value
- Percentiles Aggregation: seeking percentage
- Stats Aggregation: return avg, max, min, sum, count, etc. at the same time
- Sum Aggregation: Sum
- Top hits Aggregation: Seeking the first few
- Value Count Aggregation: Find the total
- …
To test the aggregation, we first import some data in bulk
Create an index:
PUT /cars { “settings”: { “number_of_shards”: 1, “number_of_replicas”: 0 }, “mappings”: { “transactions”: { “properties”: { “color”: { “type”: “keyword” }, “make”: { “type”: “keyword” } } } } }
Note : In ES, the fields that need to be aggregated, sorted, and filtered are treated in a special way, so they cannot be segmented. Here we set the fields of the two text types color and make to the keyword type. This type will not be segmented, and we can participate in aggregation in the future
Import Data
POST /cars/transactions/_bulk { “index”: {}} { “price” : 10000, “color” : “red”, “make” : “honda”, “sold” : “2014-10-28” } { “index”: {}} { “price” : 20000, “color” : “red”, “make” : “honda”, “sold” : “2014-11-05” } { “index”: {}} { “price” : 30000, “color” : “green”, “make” : “ford”, “sold” : “2014-05-18” } { “index”: {}} { “price” : 15000, “color” : “blue”, “make” : “toyota”, “sold” : “2014-07-02” } { “index”: {}} { “price” : 12000, “color” : “green”, “make” : “toyota”, “sold” : “2014-08-19” } { “index”: {}} { “price” : 20000, “color” : “red”, “make” : “honda”, “sold” : “2014-11-05” } { “index”: {}} { “price” : 80000, “color” : “red”, “make” : “bmw”, “sold” : “2014-01-01” } { “index”: {}} { “price” : 25000, “color” : “blue”, “make” : “ford”, “sold” : “2014-02-12” }
4.2 Aggregation into buckets
First, we colordivide according to the color of the car bucket
GET /cars/_search { “size” : 0, “aggs” : { “popular_colors” : { “terms” : { “field” : “color” } } } }
- size: the number of queries, set here to 0, because we do not care about the searched data, only care about the aggregation results, improve efficiency
- aggs: states that this is an aggregate query, which is an abbreviation of aggregations
- popular_colors: Give this aggregation a name, arbitrary.
- terms: the way to divide the bucket, here is divided according to the terms
- field: the field that divides the bucket
- popular_colors: Give this aggregation a name, arbitrary.
result:
{ “took”: 1, “timed_out”: false, “_shards”: { “total”: 1, “successful”: 1, “skipped”: 0, “failed”: 0 }, “hits”: { “total”: 8, “max_score”: 0, “hits”: [] }, “aggregations”: { “popular_colors”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “red”, “doc_count”: 4 }, { “key”: “blue”, “doc_count”: 2 }, { “key”: “green”, “doc_count”: 2 } ] } } }
- hits: The query result is empty because we set the size to 0
- aggregations: aggregation results
- popular_colors: the aggregate name we defined
- buckets: buckets found, each different color field value will form a bucket
- key: the value of the color field corresponding to this bucket
- doc_count: the number of documents in this bucket
Through the results of aggregation, we found that the red car is currently selling well!
4.3 In-bucket metrics
The previous example tells us the number of documents in each bucket, which is very useful. But usually, our application needs to provide more complex document metrics. For example, what is the average price of each color car?
Therefore, we need to tell elasticsearch use which field, which metrics, calculates that the information to be nested bucket within, metric the operation will be based on bucket conduct within the document
Now, we add a measure that averages the price to the aggregated result just now:
GET /cars/_search { “size” : 0, “aggs” : { “popular_colors” : { “terms” : { “field” : “color” }, “aggs”:{ “avg_price”: { “avg”: { “field”: “price” } } } } } }
- aggs: We add new aggs to the last aggs (popular_colors). Visible metric is also an aggregation, the metric is the aggregation in the bucket
- avg_price: the name of the aggregate
- avg: the type of measurement, here is the average
- field: the field of measurement operation
result:
… “aggregations”: { “popular_colors”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “red”, “doc_count”: 4, “avg_price”: { “value”: 32500 } }, { “key”: “blue”, “doc_count”: 2, “avg_price”: { “value”: 20000 } }, { “key”: “green”, “doc_count”: 2, “avg_price”: { “value”: 21000 } } ] } } …
You can see that each bucket has its own avg_pricefield, which is the result of metric aggregation
4.4 Nested buckets in the bucket
In the case just now, we nested measurement operations inside the bucket. In fact, buckets can not only nest operations, but also nest other buckets. That is to say, in each group, there are more groups.
For example: we want to count the manufacturer of each color of the car, and makedivide the buckets according to the field
GET /cars/_search { “size” : 0, “aggs” : { “popular_colors” : { “terms” : { “field” : “color” }, “aggs”:{ “avg_price”: { “avg”: { “field”: “price” } }, “maker”:{ “terms”:{ “field”:”make” } } } } } }
- The original color bucket and avg calculations are unchanged
- maker: Add a new bucket under the nested aggs, called maker
- terms: The division type of the bucket is still a term
- filed: divided according to the make field
Partial results:
…
{“aggregations”: { “popular_colors”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “red”, “doc_count”: 4, “maker”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “honda”, “doc_count”: 3 }, { “key”: “bmw”, “doc_count”: 1 } ] }, “avg_price”: { “value”: 32500 } }, { “key”: “blue”, “doc_count”: 2, “maker”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “ford”, “doc_count”: 1 }, { “key”: “toyota”, “doc_count”: 1 } ] }, “avg_price”: { “value”: 20000 } }, { “key”: “green”, “doc_count”: 2, “maker”: { “doc_count_error_upper_bound”: 0, “sum_other_doc_count”: 0, “buckets”: [ { “key”: “ford”, “doc_count”: 1 }, { “key”: “toyota”, “doc_count”: 1 } ] }, “avg_price”: { “value”: 21000 } } ] } } }
- We can see that the new aggregate makeris nested in each original colorbucket.
- The following each color are makegrouped Field
- Information we can read:
- There are 4 red cars
- The average selling price of a red car is $ 32,500.
- Three of them are made by Honda and one is made by BMW.
4.5. Other ways of dividing barrels
As mentioned earlier, there are many ways to divide the bucket, for example:
- Date Histogram Aggregation: grouped according to the date ladder, for example, if the given ladder is a week, it will be automatically divided into a group every week
- Histogram Aggregation: grouped according to numerical ladder, similar to date
- Terms Aggregation: grouped according to the content of the entry
- Range Aggregation: Range grouping of numeric values and dates, specify start and end, and then group by segments
In the case just now, we used Terms Aggregation, which divides buckets according to terms.
Next, we learn a few more practical ones:
4.5.1. Histogram of stepped barrels
principle:
histogram is to group numeric fields according to a certain ladder size. You need to specify a ladder value (interval) to divide the ladder size.
Examples:
For example, if you have a price field, if you set the interval value to 200, then the ladder will look like this:
0, 200, 400, 600, …
The keys listed above are the keys of each ladder and the starting point of the interval.
If the price of a product is 450, which step range will it fall into? Calculated as follows:
bucket_key = Math.floor((value – offset) / interval) * interval + offset
value: the value of the current data, in this case 450
offset: starting offset, default is 0
interval: step interval, such as 200
So the key you get = Math.floor ((450-0) / 200) * 200 + 0 = 400
Operate:
For example, we group the prices of cars and specify the interval to 5000:
GET /cars/_search { “size”:0, “aggs”:{ “price”:{ “histogram”: { “field”: “price”, “interval”: 5000 } } } } result: { “took”: 21, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “skipped”: 0, “failed”: 0 }, “hits”: { “total”: 8, “max_score”: 0, “hits”: [] }, “aggregations”: { “price”: { “buckets”: [ { “key”: 10000, “doc_count”: 2 }, { “key”: 15000, “doc_count”: 1 }, { “key”: 20000, “doc_count”: 2 }, { “key”: 25000, “doc_count”: 1 }, { “key”: 30000, “doc_count”: 1 }, { “key”: 35000, “doc_count”: 0 }, { “key”: 40000, “doc_count”: 0 }, { “key”: 45000, “doc_count”: 0 }, { “key”: 50000, “doc_count”: 0 }, { “key”: 55000, “doc_count”: 0 }, { “key”: 60000, “doc_count”: 0 }, { “key”: 65000, “doc_count”: 0 }, { “key”: 70000, “doc_count”: 0 }, { “key”: 75000, “doc_count”: 0 }, { “key”: 80000, “doc_count”: 1 } ] } } }You will find that there are a large number of buckets with 0 documents in the middle, which looks very ugly.
We can add a parameter min_doc_count to 1 to restrict the minimum number of documents to 1, so that the bucket with the number of documents 0 will be filtered
Examples:
GET /cars/_search { “size”:0, “aggs”:{ “price”:{ “histogram”: { “field”: “price”, “interval”: 5000, “min_doc_count”: 1 } } } } result: { “took”: 15, “timed_out”: false, “_shards”: { “total”: 5, “successful”: 5, “skipped”: 0, “failed”: 0 }, “hits”: { “total”: 8, “max_score”: 0, “hits”: [] }, “aggregations”: { “price”: { “buckets”: [ { “key”: 10000, “doc_count”: 2 }, { “key”: 15000, “doc_count”: 1 }, { “key”: 20000, “doc_count”: 2 }, { “key”: 25000, “doc_count”: 1 }, { “key”: 30000, “doc_count”: 1 }, { “key”: 80000, “doc_count”: 1 } ] } } }
perfect,!
4.5.2. Range
Range bucketing is similar to ladder bucketing, in which numbers are grouped in stages, but the range method requires you to specify the start and end size of each group.