Elasticsearch Queries – Part 3: Bool Queries and Pagination

Introduction

If you’ve been following this series, you already know:

In Part 1: Queries vs Filters we explored the mental model behind Elasticsearch queries and why mapping is critical.
In Part 2: Practical Query Types we walked through the most common query types, like match, term, and range.

Now it’s time for the real workhorse: the bool query.
Why? Because no real-world search problem is solved by just one condition. Users expect relevance and restrictions:

A keyword search (relevance).
But also filter by permissions, time ranges, categories, stock availability, or price.

The bool query is how you glue all of these conditions together.

By the way, I should also mention that I primarily wrote this series to solidify my own learning and to serve as a future reference for myself. So, if you feel like some parts are a bit too brief or summarized, please let me know and I can elaborate!

Why Bool Is Everywhere

Elasticsearch automatically wraps most queries inside a bool behind the scenes. Why? Because real-world search = layers of logic.

Think of bool as the engine that mixes rules and ranking:

must: core meaning of the query.
filter: constraints (faster, cached, no scoring).
should: nice-to-have boosts.
must_not: exclusions.

Analogy: If search was a hiring process:

must = the required skills.
filter = eligibility criteria (work permit, location).
should = bonus points (speaks German, open-source contributor).
must_not = deal-breakers (fake CVs, banned candidates).

The Four Pillars of Bool Queries

1. must: Match & Affect Score

Use when the query should both filter results and influence ranking.

Example: Find blog posts with “python” in the title

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "python" } }
      ]
    }
  }
}

Notes:

Documents without “python” in the title won’t appear.
Documents with stronger matches rank higher.

Real-world use case:

Job portal: must match “data engineer” in job title.
Product search: must match “laptop” in product name.

2. filter: Restrict Without Scoring

Filter clauses narrow the result set but do not affect relevance score.
They’re also cached, so they’re extremely fast.

Example: Only show published articles

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "filter": [
        { "term": { "status": "published" } }
      ]
    }
  }
}

Notes:

Use for dates, categories, permissions.
Think of filters as the non-negotiables.

Real-world use case:

E-commerce: filter products with in_stock = true.
News site: filter articles published_at >= now-30d.

3. should: Boost Preferred Matches

Should clauses are optional — but they boost relevance when they match.

Example: Prefer tutorials and recent content

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "should": [
        { "match": { "tags": "tutorial" } },
        { "range": { "published_at": { "gte": "2024-01-01" } } }
      ],
      "minimum_should_match": 0
    }
  }
}

Notes:

Without should matches, docs still show up.
With should matches, docs get ranked higher.
Use minimum_should_match to force at least N should clauses.

Real-world use case:

Search “python” but boost documents tagged “beginner-friendly.”
Search “laptop” but boost those with “2024 model.”

4. must_not: Exclude Hard Rules

Excludes unwanted documents.

Example: Exclude drafts

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "must_not": [
        { "term": { "status": "draft" } }
      ]
    }
  }
}

Notes:

Great for spam removal, blocked users, hidden products.

Real-world use case:

Marketplace: must_not show products flagged as “banned.”
Internal search: must_not show documents marked “confidential.”

Must vs Filter: The Golden Rule

This is one of the most common beginner mistakes:

Use must when the condition should influence relevance ranking.
Use filter when the condition is just a constraint.

Example:

Laptop under $1,000 → filter.
Laptop with “gaming” in the title → must.

This choice affects both performance and user satisfaction.

Highlighting: Show Why a Result Matched

Search is useless if users don’t see why something matched.

Example: Highlight matched keywords in content

{
  "query": {
    "match": { "content": "python tutorial" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

Sample response:

"highlight": {
  "content": [
    "... learn <em>python</em> step by step in this <em>tutorial</em> ..."
  ]
}

Why it matters:

Improves click-through rates.
Builds trust in search results.
Helps users skim faster.

Pagination & Sorting

1. Basic Pagination (“from” + “size”)

{
  "from": 0,
  "size": 10,
  "query": { "match": { "content": "python" } }
}

Good for small offsets. But costly at deep pages (from = 10000 = heavy).

2. Deep Pagination with “search_after”

Efficient for infinite scroll or “Load More” buttons.

{
  "size": 10,
  "query": { "match": { "content": "python" } },
  "sort": [{ "published_at": "desc" }],
  "search_after": ["2024-07-10T10:15:00"]
}

Remember when you use search_after, you provide the values from the sort array of the last document of the previous page. Now you may ask “What if you have multiple sort fields?”.

The answer is straightforward. You have to provide the full tuple of sort values in search_after and the order must exactly match the order of the fields in sort:

{
  "size": 10,
  "sort": [
    { "timestamp": "asc" },
    { "id": "desc" }
  ],
  "search_after": ["2025-09-26T12:00:00Z", 1234]
}

3. Sorting by Multiple Fields

"sort": [
  { "_score": "desc" },
  { "published_at": "desc" }
]

Ensures stable sorting (important when many docs have equal scores).

Putting It All Together: A Real-World Example

Scenario: A blog search engine

Must have “python” in title.
Only published articles.
Exclude drafts.
Prefer tutorials and recent content.
Show highlighted snippets.
Sort by score, then publish date.
Paginate 10 per page.

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "python" } }
      ],
      "filter": [
        { "term": { "status": "published" } }
      ],
      "must_not": [
        { "term": { "category": "draft" } }
      ],
      "should": [
        { "match": { "tags": "tutorial" } },
        { "range": { "published_at": { "gte": "2024-01-01" } } }
      ],
      "minimum_should_match": 0
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  },
  "sort": [
    { "_score": "desc" },
    { "published_at": "desc" }
  ],
  "from": 0,
  "size": 10
}

Wrap-Up

The bool query is the Swiss Army knife of Elasticsearch. Mastering it lets you:

Glue multiple conditions seamlessly.
Balance relevance (must, should) with constraints (filter, must_not).
Improve UX with highlighting.
Keep results fast and scalable with proper pagination & sorting.

Coming Next in the Series:

Part 4: Aggregations — Turning Search into Analytics

Key Principles of Data Modeling: A Roadmap to Success in the Big Data Era

Elasticsearch Queries – Part 1: Queries and Filters

Disaster Backup Guide

Apache Airflow: Like an Organized Orchestra for Your Startup

From Crisis to Stability: Backing Up and Protecting Data in Times of Peace

Elasticsearch Queries – Part 2: Practical Query Types

Leave a Reply Cancel reply

Introduction

Why Bool Is Everywhere

The Four Pillars of Bool Queries

1. must: Match & Affect Score

2. filter: Restrict Without Scoring

3. should: Boost Preferred Matches

4. must_not: Exclude Hard Rules

Must vs Filter: The Golden Rule

Highlighting: Show Why a Result Matched

Pagination & Sorting

1. Basic Pagination (“from” + “size”)

2. Deep Pagination with “search_after”

3. Sorting by Multiple Fields

Putting It All Together: A Real-World Example

Wrap-Up

Similar Posts

Leave a Reply Cancel reply