Problem Statement
I have configured Error Logs for my Elasticsearch cluster, and I see a frequent error below in the logs —
org.elasticsearch.ElasticsearchException$1: Result window is too large, from + size must be less than or equal to: [10000] but was [15020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
Explanation
My understanding is that when we are doing a search that is too wide, Elasticsearch is limiting it to 10,000 results by default for better memory usage & fast performance.
For example – The search query might return 15,000 results, but Elasticsearch displays only 10,000, and just because it truncated the result set, it logs this error to notify.
It is important to understand that the Index setting, which controls the behavior, is index.max_result_window. This index setting defines the maximum value of from + size for searches to this index. It defaults to 10000. Search requests take heap memory and time proportional to from + size, and this limits that memory. A search request in Elasticsearch generally spans across multiple shards. Each shard generates its sorted results, which need to be sorted centrally to ensure that the overall order is correct.
Search in ES contains two phases — the Query phase, and the Fetch phase.
Figure 1: An Elasticsearch three-node cluster with the index divided into six shards
During the QUERY phase, the search query is sent to all the shards in the Elasticsearch index. Each shard executes the search operation locally and builds a priority queue of matching documents. It then returns the result set containing the DocumentID and the score to the coordinating node.
During the FETCH phase, the coordinating node first decides which documents should be retrieved. For example — If the elasticsearch query specifies { “from”: 50, “size”: 10 }, then the first 50 results would be discarded, and only the next 10 results would need to be retrieved.
To summarize, the query phase identifies the documents that satisfy the search request, and the fetch phase retrieves the documents itself and passes it to the coordinating nodes.
Example —
This means the first 1000 results would be discarded, and only the next 100 results will be retrieved. Note that from + size can not be more than the index.max_result_window index setting, which defaults to 10,000.
How do you modify the ‘index.max_result_window’ to a higher value?
I ran the below update query in Kibana to increase the max result window to 20000 and got an ‘” acknowledged”: true’ message —
If you are interested to learn more about Elasticsearch performance, you can read my article about the Five critical Elasticsearch metrics to monitor here.
https://iamondemand.com/blog/top-5-elasticsearch-metrics-to-monitor/
Categories: Elasticsearch
Leave a Reply