# Elastic Search
Welcome to Elastic Docs (opens new window) Api conventions (opens new window)
# Key terms
- Documents — Data is stored as JSON documents.
- Indices — Every document is indexed (inverted index) to perform fast searches.
- Node — Every instance of elastic search is a node. Nodes actually handle the client requests. There are different roles a node.
- Aggregation - is a way to group and summarize data. Elasticsearch provides a wide range of aggregation types, such as terms, range, date histogram, and geo distance
- Shard - A shard is a subset of an index that contains a portion of the index's data.
- Cluster — Group of nodes that have the same cluster.name attribute.
- Optimistic concurrency control
# How to Communicate
- Elastic search provides REST apis. E.g creating index, listing all indices, querying a document, etc
- These APIs support structured queries, full text queries, and complex queries that combine the two.
# How documents are ranked in Elasticsearch
# Relevance Score
The relevance score is a measure of how well a document matches the search query. Elasticsearch calculates the relevance score based on a number of factors, including:
- Term frequency: How often the search terms appear in the document.
- Inverse document frequency: How common the search terms are across all documents in the index.
- Field length: Longer fields are given less weight than shorter fields.
- Term proximity: How close the search terms are to each other in the document.
Elasticsearch uses a variant of the TF-IDF algorithm to calculate the relevance score. The relevance score is a decimal value between 0 and 1, with 1 being the most relevant.
# Sorting Criteria
In addition to the relevance score, Elasticsearch also uses sorting criteria to determine the order of the results in the response list. Sorting criteria can be specified when you perform a search query and can include fields such as date, price, or popularity.
By default, Elasticsearch sorts the results by relevance score in descending order. This means that the most relevant results appear at the top of the response list. However, you can specify additional sorting criteria to further refine the order of the results.
For example, if you are searching for products on an e-commerce website, you might want to sort the results by price in ascending order. This would ensure that the cheapest products appear at the top of the response list.
# Searching for Same Meaning Words
we can achieve this by using synonyms
# Define Synonyms
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });
const indexName = 'my_index';
const settings = {
analysis: {
filter: {
my_synonyms: {
type: 'synonym',
// here
synonyms: [
'car, automobile',
'bike, bicycle',
'bus, coach, minibus',
'train, railway',
'plane, airplane, aircraft'
]
}
},
analyzer: {
my_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'my_synonyms']
}
}
}
};
client.indices.create({
index: indexName,
body: {
settings,
}
}, (err, resp) => {
if (err) console.trace(err.message);
else console.log(`Index ${indexName} created.`);
});
# Search for Synonyms
const indexName = 'my_index';
const searchParams = {
index: indexName,
body: {
query: {
match: {
my_field: {
query: 'car',
analyzer: 'my_analyzer'
}
}
}
}
};
client.search(searchParams, (err, resp) => {
if (err) console.trace(err.message);
else console.log(resp.hits.hits);
});
In this example, we search for the word car
using the my_analyzer
analyzer. The analyzer will expand the search to include the synonyms automobile
. The search results will include all documents that contain the word car
or automobile
.
# Alternative to Data given
{
// ...
filter: {
your_synonym_filter_name: {
type: 'synonym',
synonyms_path: 'path_to_synonyms_file'
}
}
}
Use synonyms_path
instead synonyms
Each line in the file path_to_synonyms_file
should contain a group of synonyms separated by commas
car, automobile
bike, bicycle
bus, coach, minibus
train, railway
plane, airplane, aircraft
# Searching for Mistyped Words
# 1. Use Fuzzy Query
Fuzzy Query feature allows you to search for words that are similar to the one you are looking for, even if they are not spelled correctly. For example, if you are searching for the word apple
but someone has typed aple
instead, the Fuzzy Query feature will still return the correct result.
const { body } = await client.search({
index: 'my_index',
body: {
query: {
fuzzy: {
'my_field': {
value: 'aple',
fuzziness: 'AUTO', // here
}
}
}
}
});
console.log(body.hits.hits);
# 2. Use Phonetic Analysis
This technique involves converting words into their phonetic representations and then searching for words that sound similar to the one you are looking for. This can be especially useful for words that are commonly misspelled, such as names or technical terms.
const { body } = await client.search({
index: 'my_index',
body: {
query: {
match: {
'my_field': {
query: 'aple',
analyzer: 'my_phonetic_analyzer'
}
}
}
}
});
console.log(body.hits.hits);
# To create my_phonetic_analyzer
- Install plugin
sudo bin/elasticsearch-plugin install analysis-phonetic
- Create a custom phonetic analyzer
client.indices.create({
index: 'my_index',
body: {
settings: {
analysis: {
analyzer: {
my_phonetic_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'my_phonetic_filter']
}
},
filter: {
my_phonetic_filter: {
type: 'phonetic',
encoder: 'double_metaphone',
replace: true
}
}
}
}
}
}, (err, resp) => {
if (err) console.trace(err.message);
});
- use the custom analyzer
client.index({
index: 'my_index',
body: {
title: 'The Night Knight',
content: 'A story about a knight who fights at night.'
},
analyzer: 'my_phonetic_analyzer'
}, (err, resp) => {
if (err) console.trace(err.message);
});
# 3. Use N-Grams
N-Grams are a type of tokenization that breaks words down into smaller pieces and indexes them separately. This allows you to search for words even if they are misspelled or incomplete.
const { body } = await client.search({
index: 'my_index',
body: {
query: {
match: {
'my_field': {
query: 'aple',
analyzer: 'my_ngram_analyzer'
}
}
}
}
});
console.log(body.hits.hits);
To create my_ngram_analyzer
const settings = {
analysis: {
analyzer: {
my_ngram_analyzer: {
tokenizer: 'my_ngram_tokenizer'
}
},
tokenizer: {
my_ngram_tokenizer: {
type: 'ngram',
min_gram: 3,
max_gram: 5,
token_chars: ['letter', 'digit']
}
}
}
};
const mappings = {
properties: {
my_field: {
type: 'text',
analyzer: 'my_ngram_analyzer'
}
}
};
await client.indices.create({
index: indexName,
body: {
settings,
mappings,
}
});
# Use Geo-point
// create index
await client.indices.create({
index: 'my_index',
body: {
mappings: {
properties: {
location: { type: 'geo_point' }
}
}
}
});
// index document
await client.index({
index: 'my_index',
body: {
location: {
lat: 40.7128,
lon: -74.0060
}
}
});
// searching
const response = await client.search({
index: 'my_index',
body: {
query: {
geo_distance: {
distance: '10km',
location: {
lat: 40.7128,
lon: -74.0060
}
}
}
}
});
console.log(response.hits.hits);
# Match query vs Term query
- The
match
query is a full-text search query - The
term
query is a query that looks for exact matches of a term or phrase in the index.
// match query
const { body } = await client.search({
index: 'my_index',
body: {
query: {
match: { title: 'Nodejs elasticsearch' }
}
}
});
console.log(body.hits.hits);
// term query
const { body } = await client.search({
index: 'my_index',
body: {
query: {
term: { title: 'Nodejs elasticsearch' }
}
}
});
console.log(body.hits.hits);
# Use Aggregation
await client.index({
index: 'myindex',
body: {
name: 'John Doe',
age: 30,
city: 'New York'
}
});
await client.index({
index: 'myindex',
body: {
name: 'Jane Doe',
age: 25,
city: 'Los Angeles'
}
});
const response = await client.search({
index: 'myindex',
body: {
aggs: {
avg_age: {
avg: {
field: 'age'
}
},
city_count: {
terms: {
field: 'city'
}
}
}
}
});
console.log(response.body.aggregations);
# date_histogram Aggregation
allows you to group data based on a specific time interval
const { body } = await client.search({
index: 'my_index',
body: {
query: {
match_all: {}
},
aggs: {
sales_over_time: {
date_histogram: {
field: 'timestamp',
interval: 'day'
}
}
}
}
});
const salesOverTime = body.aggregations.sales_over_time.buckets;
console.log(salesOverTime);
Others are avg
, sum
, min
, max
, cardinality
, terms
, date_histogram
, and more.
# Sharding data
# 1. Create an Index
await client.indices.create({
index: 'my_index',
body: {
settings: {
number_of_shards: 3,
number_of_replicas: 1
}
}
}, { ignore: [400] }); // status ignore
# 2. Index Data
await client.index({
index: 'my_index',
body: {
title: 'My Document',
content: 'This is my document content.'
},
routing: 'my_routing_value',
});
routing
is a way to ensure that related documents are stored in the same shard.
# 3. Search Data
const { body } = await client.search({
index: 'my_index',
body: {
query: {
match: {
content: 'document'
}
}
}
});
console.log(body.hits.hits);
# Other Samples
# Indexes
Creates a new index customer
PUT customer
Lists all indices
GET /_cat/indices?v
Deletes the index customer, pretty option to pretty print output of the command
DELETE /customer?pretty
# Documents
Document API (opens new window)
Add a new document to the index customer
PUT /customer/_doc/1?pretty
{
"name": "John Doe"
}
// Response of the above command returns the index created
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Updates existing document in index customer with id->1
POST /customer/_update/1?pretty
{
"doc": { "name": "Jane Doe" }
}
Deletes existing document in index customer with id -> 1
DELETE /customer/_doc/1?pretty
# Other
- TF-IDF = Term Frequency–Inverse Document Frequency
- Running the Elastic Stack on Docker (opens new window)
- Use Cases, Architecture, and 6 Best Practices (opens new window)