A backend engineer's journey of learning and growth.
by kan01234
Elasticsearch (ES) has become the go-to engine for powering search, analytics, and log processing. From e-commerce platforms handling millions of product searches per day to observability pipelines crunching terabytes of logs, ES consistently delivers sub-second responses. But what makes it so fast—and how exactly are documents stored inside it?
This post explores the internals of Elasticsearch: the inverted index, document storage, segments, and retrieval paths.
Traditional databases rely on row-based indexes such as B-Trees. Elasticsearch, built on Apache Lucene, uses a different model: the inverted index. Instead of mapping primary keys to rows, it maps terms to the documents that contain them.
For example:
Document 1: "Elasticsearch is fast"
Document 2: "Fast search engines use inverted indexes"
The inverted index looks like:
elasticsearch → [1]
fast → [1, 2]
search → [2]
engines → [2]
use → [2]
inverted → [2]
indexes → [2]
With this structure, finding all documents that contain “fast” is just a matter of looking up the term in the dictionary—no need to scan rows.
✅ Why it’s fast: ES avoids full scans by jumping directly to the matching document IDs.
When you index a JSON document into ES:
{
"user": "kan",
"message": "Elasticsearch is fast",
"likes": 3
}
Elasticsearch transforms it into several storage layers:
GET
._source
to save space, but you lose the ability to retrieve the raw document.elasticsearch → [doc1]
, fast → [doc1]
.likes
, user
)._id
, routing information, and field-level storage.So after indexing, ES effectively has:
`` DocID: 1 ————————– _source: {user:”kan”, message:”Elasticsearch is fast”, likes:3}
Inverted Index: “elasticsearch” → [1] “fast” → [1]
Doc Values (columnar): user: [ “kan”, … ] likes: [ 3, … ] ``
Internally, Lucene (which ES is built on) writes documents into segments. Each segment is like a mini-index containing:
Segments are immutable. New documents create new segments. Deletes are handled by a delete marker bitmap until a background merge compacts segments.
✅ Why it’s fast: Immutable segments allow lock-free, concurrent reads even while indexing is happening.
When you do:
GET my_index/_doc/1
Elasticsearch looks up the doc ID in the stored fields, retrieves the _source
JSON (if enabled), and returns it.
When you run a search query:
_source
only for the top hits (not for every candidate doc).Elasticsearch is fast not because of a single trick, but because of a carefully layered design:
The result: a system that can handle both search and analytics at scale with sub-second performance.
tags: es, - performance, - data-structures