why does elasticsearch designed to be near real time?
The official documentation explains it pretty well why it’s designed the way it is: www.elastic.co/guide/en/elasticsearch/reference/8.6/near-real-time.html
Sitting between Elasticsearch and the disk is the filesystem cache. Documents in the in-memory indexing buffer (Figure 1) are written to a new segment (Figure 2). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.
So, between the moment a document to be indexed arrives into the Elasticsearch in-memory indexing buffer (i.e. inside the JVM heap) and the moment it is written into a segment to the physical disk, it will transit through the filesystem cache (i.e. the remaining 50% of the physical RAM) where it is already searchable.
The transition from the indexing buffer to the filesystem cache is carried out by the refresh operation which happens in general every second, hence why “near real time”. Then transiting the data from the filesystem cache to the disk requires a Lucene commit operation, which is a much more expensive operation and is performed less frequently.
Read more here: Source link