elasticsearch logo for tutorial bool queries, highlighting and pagination
|

Elasticsearch Queries – Part 3: Bool Queries and Pagination

Introduction

If you’ve been following this series, you already know:

Now it’s time for the real workhorse: the bool query.
Why? Because no real-world search problem is solved by just one condition. Users expect relevance and restrictions:

  • A keyword search (relevance).
  • But also filter by permissions, time ranges, categories, stock availability, or price.

The bool query is how you glue all of these conditions together.

By the way, I should also mention that I primarily wrote this series to solidify my own learning and to serve as a future reference for myself. So, if you feel like some parts are a bit too brief or summarized, please let me know and I can elaborate!


Why Bool Is Everywhere

Elasticsearch automatically wraps most queries inside a bool behind the scenes. Why? Because real-world search = layers of logic.

Think of bool as the engine that mixes rules and ranking:

  • must: core meaning of the query.
  • filter: constraints (faster, cached, no scoring).
  • should: nice-to-have boosts.
  • must_not: exclusions.

Analogy: If search was a hiring process:

  • must = the required skills.
  • filter = eligibility criteria (work permit, location).
  • should = bonus points (speaks German, open-source contributor).
  • must_not = deal-breakers (fake CVs, banned candidates).

The Four Pillars of Bool Queries

1. must: Match & Affect Score

Use when the query should both filter results and influence ranking.

Example: Find blog posts with “python” in the title

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "python" } }
      ]
    }
  }
}

Notes:

  • Documents without “python” in the title won’t appear.
  • Documents with stronger matches rank higher.

Real-world use case:

  • Job portal: must match “data engineer” in job title.
  • Product search: must match “laptop” in product name.

2. filter: Restrict Without Scoring

Filter clauses narrow the result set but do not affect relevance score.
They’re also cached, so they’re extremely fast.

Example: Only show published articles

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "filter": [
        { "term": { "status": "published" } }
      ]
    }
  }
}

Notes:

  • Use for dates, categories, permissions.
  • Think of filters as the non-negotiables.

Real-world use case:

  • E-commerce: filter products with in_stock = true.
  • News site: filter articles published_at >= now-30d.

3. should: Boost Preferred Matches

Should clauses are optional — but they boost relevance when they match.

Example: Prefer tutorials and recent content

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "should": [
        { "match": { "tags": "tutorial" } },
        { "range": { "published_at": { "gte": "2024-01-01" } } }
      ],
      "minimum_should_match": 0
    }
  }
}

Notes:

  • Without should matches, docs still show up.
  • With should matches, docs get ranked higher.
  • Use minimum_should_match to force at least N should clauses.

Real-world use case:

  • Search “python” but boost documents tagged “beginner-friendly.”
  • Search “laptop” but boost those with “2024 model.”

4. must_not: Exclude Hard Rules

Excludes unwanted documents.

Example: Exclude drafts

{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "python" } }
      ],
      "must_not": [
        { "term": { "status": "draft" } }
      ]
    }
  }
}

Notes:

  • Great for spam removal, blocked users, hidden products.

Real-world use case:

  • Marketplace: must_not show products flagged as “banned.”
  • Internal search: must_not show documents marked “confidential.”

Must vs Filter: The Golden Rule

This is one of the most common beginner mistakes:

  • Use must when the condition should influence relevance ranking.
  • Use filter when the condition is just a constraint.

Example:

  • Laptop under $1,000 → filter.
  • Laptop with “gaming” in the title → must.

This choice affects both performance and user satisfaction.


Highlighting: Show Why a Result Matched

Search is useless if users don’t see why something matched.

Example: Highlight matched keywords in content

{
  "query": {
    "match": { "content": "python tutorial" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

Sample response:

"highlight": {
  "content": [
    "... learn <em>python</em> step by step in this <em>tutorial</em> ..."
  ]
}

Why it matters:

  • Improves click-through rates.
  • Builds trust in search results.
  • Helps users skim faster.

Pagination & Sorting

1. Basic Pagination (“from” + “size”)

{
  "from": 0,
  "size": 10,
  "query": { "match": { "content": "python" } }
}

Good for small offsets. But costly at deep pages (from = 10000 = heavy).


2. Deep Pagination with “search_after”

Efficient for infinite scroll or “Load More” buttons.

{
  "size": 10,
  "query": { "match": { "content": "python" } },
  "sort": [{ "published_at": "desc" }],
  "search_after": ["2024-07-10T10:15:00"]
}

Remember when you use search_after, you provide the values from the sort array of the last document of the previous page. Now you may ask “What if you have multiple sort fields?”.

The answer is straightforward. You have to provide the full tuple of sort values in search_after and the order must exactly match the order of the fields in sort:

{
  "size": 10,
  "sort": [
    { "timestamp": "asc" },
    { "id": "desc" }
  ],
  "search_after": ["2025-09-26T12:00:00Z", 1234]
}


3. Sorting by Multiple Fields

"sort": [
  { "_score": "desc" },
  { "published_at": "desc" }
]

Ensures stable sorting (important when many docs have equal scores).


Putting It All Together: A Real-World Example

Scenario: A blog search engine

  • Must have “python” in title.
  • Only published articles.
  • Exclude drafts.
  • Prefer tutorials and recent content.
  • Show highlighted snippets.
  • Sort by score, then publish date.
  • Paginate 10 per page.
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "python" } }
      ],
      "filter": [
        { "term": { "status": "published" } }
      ],
      "must_not": [
        { "term": { "category": "draft" } }
      ],
      "should": [
        { "match": { "tags": "tutorial" } },
        { "range": { "published_at": { "gte": "2024-01-01" } } }
      ],
      "minimum_should_match": 0
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  },
  "sort": [
    { "_score": "desc" },
    { "published_at": "desc" }
  ],
  "from": 0,
  "size": 10
}

Wrap-Up

The bool query is the Swiss Army knife of Elasticsearch. Mastering it lets you:

  • Glue multiple conditions seamlessly.
  • Balance relevance (must, should) with constraints (filter, must_not).
  • Improve UX with highlighting.
  • Keep results fast and scalable with proper pagination & sorting.

Coming Next in the Series:

Part 4: Aggregations — Turning Search into Analytics

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *