Elasticsearch Queries – Part 3: Bool Queries and Pagination
Introduction
If you’ve been following this series, you already know:
- In Part 1: Queries vs Filters we explored the mental model behind Elasticsearch queries and why mapping is critical.
- In Part 2: Practical Query Types we walked through the most common query types, like
match
,term
, andrange
.
Now it’s time for the real workhorse: the bool
query.
Why? Because no real-world search problem is solved by just one condition. Users expect relevance and restrictions:
- A keyword search (relevance).
- But also filter by permissions, time ranges, categories, stock availability, or price.
The bool
query is how you glue all of these conditions together.
By the way, I should also mention that I primarily wrote this series to solidify my own learning and to serve as a future reference for myself. So, if you feel like some parts are a bit too brief or summarized, please let me know and I can elaborate!
Why Bool Is Everywhere
Elasticsearch automatically wraps most queries inside a bool
behind the scenes. Why? Because real-world search = layers of logic.
Think of bool
as the engine that mixes rules and ranking:
- must: core meaning of the query.
- filter: constraints (faster, cached, no scoring).
- should: nice-to-have boosts.
- must_not: exclusions.
Analogy: If search was a hiring process:
- must = the required skills.
- filter = eligibility criteria (work permit, location).
- should = bonus points (speaks German, open-source contributor).
- must_not = deal-breakers (fake CVs, banned candidates).
The Four Pillars of Bool Queries
1. must: Match & Affect Score
Use when the query should both filter results and influence ranking.
Example: Find blog posts with “python” in the title
{ "query": { "bool": { "must": [ { "match": { "title": "python" } } ] } } }
Notes:
- Documents without “python” in the title won’t appear.
- Documents with stronger matches rank higher.
Real-world use case:
- Job portal: must match “data engineer” in job title.
- Product search: must match “laptop” in product name.
2. filter: Restrict Without Scoring
Filter clauses narrow the result set but do not affect relevance score.
They’re also cached, so they’re extremely fast.
Example: Only show published articles
{ "query": { "bool": { "must": [ { "match": { "content": "python" } } ], "filter": [ { "term": { "status": "published" } } ] } } }
Notes:
- Use for dates, categories, permissions.
- Think of filters as the non-negotiables.
Real-world use case:
- E-commerce: filter products with
in_stock = true
. - News site: filter articles
published_at >= now-30d
.
3. should: Boost Preferred Matches
Should clauses are optional — but they boost relevance when they match.
Example: Prefer tutorials and recent content
{ "query": { "bool": { "must": [ { "match": { "content": "python" } } ], "should": [ { "match": { "tags": "tutorial" } }, { "range": { "published_at": { "gte": "2024-01-01" } } } ], "minimum_should_match": 0 } } }
Notes:
- Without
should
matches, docs still show up. - With
should
matches, docs get ranked higher. - Use
minimum_should_match
to force at least Nshould
clauses.
Real-world use case:
- Search “python” but boost documents tagged “beginner-friendly.”
- Search “laptop” but boost those with “2024 model.”
4. must_not: Exclude Hard Rules
Excludes unwanted documents.
Example: Exclude drafts
{ "query": { "bool": { "must": [ { "match": { "content": "python" } } ], "must_not": [ { "term": { "status": "draft" } } ] } } }
Notes:
- Great for spam removal, blocked users, hidden products.
Real-world use case:
- Marketplace: must_not show products flagged as “banned.”
- Internal search: must_not show documents marked “confidential.”
Must vs Filter: The Golden Rule
This is one of the most common beginner mistakes:
- Use must when the condition should influence relevance ranking.
- Use filter when the condition is just a constraint.
Example:
- Laptop under $1,000 → filter.
- Laptop with “gaming” in the title → must.
This choice affects both performance and user satisfaction.
Highlighting: Show Why a Result Matched
Search is useless if users don’t see why something matched.
Example: Highlight matched keywords in content
{ "query": { "match": { "content": "python tutorial" } }, "highlight": { "fields": { "content": {} } } }
Sample response:
"highlight": { "content": [ "... learn <em>python</em> step by step in this <em>tutorial</em> ..." ] }
Why it matters:
- Improves click-through rates.
- Builds trust in search results.
- Helps users skim faster.
Pagination & Sorting
1. Basic Pagination (“from” + “size”)
{ "from": 0, "size": 10, "query": { "match": { "content": "python" } } }
Good for small offsets. But costly at deep pages (from = 10000
= heavy).
2. Deep Pagination with “search_after”
Efficient for infinite scroll or “Load More” buttons.
{ "size": 10, "query": { "match": { "content": "python" } }, "sort": [{ "published_at": "desc" }], "search_after": ["2024-07-10T10:15:00"] }
Remember when you use search_after
, you provide the values from the sort
array of the last document of the previous page. Now you may ask “What if you have multiple sort fields?”.
The answer is straightforward. You have to provide the full tuple of sort values in search_after
and the order must exactly match the order of the fields in sort
:
{ "size": 10, "sort": [ { "timestamp": "asc" }, { "id": "desc" } ], "search_after": ["2025-09-26T12:00:00Z", 1234] }
3. Sorting by Multiple Fields
"sort": [ { "_score": "desc" }, { "published_at": "desc" } ]
Ensures stable sorting (important when many docs have equal scores).
Putting It All Together: A Real-World Example
Scenario: A blog search engine
- Must have “python” in title.
- Only published articles.
- Exclude drafts.
- Prefer tutorials and recent content.
- Show highlighted snippets.
- Sort by score, then publish date.
- Paginate 10 per page.
{ "query": { "bool": { "must": [ { "match": { "title": "python" } } ], "filter": [ { "term": { "status": "published" } } ], "must_not": [ { "term": { "category": "draft" } } ], "should": [ { "match": { "tags": "tutorial" } }, { "range": { "published_at": { "gte": "2024-01-01" } } } ], "minimum_should_match": 0 } }, "highlight": { "fields": { "content": {} } }, "sort": [ { "_score": "desc" }, { "published_at": "desc" } ], "from": 0, "size": 10 }
Wrap-Up
The bool
query is the Swiss Army knife of Elasticsearch. Mastering it lets you:
- Glue multiple conditions seamlessly.
- Balance relevance (must, should) with constraints (filter, must_not).
- Improve UX with highlighting.
- Keep results fast and scalable with proper pagination & sorting.
Coming Next in the Series:
Part 4: Aggregations — Turning Search into Analytics