Offset pagination optimization

In many REST APIs endpoints, we use offset-based pagination which uses the page URL parameter to paginate through the results. Offset pagination scales linearly, the higher the page number, the slower the database query gets. This means that for large page numbers, the database query can time out. This usually happens when third-party integrations and scripts interact with the system as users are unlikely to deliberately visit a high page number.

The ideal way of dealing with scalability problems related to offset pagination is switching to keyset pagination however, this is means a breaking API change. To provide a temporary, stop-gap measure, you can use the Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan class. The optimization can help in certain cases to improve the performance of offset-paginated queries when high OFFSET value is present. The performance improvement means that the queries will continue to scale linearly with improved query timings, which means that reaching database timeout will happen at a much higher page number if it happens at all.

Requirements for using the optimization

The optimization avoids calling SELECT * when determining the records based on the ORDER BY, OFFSET, and LIMIT clauses and attempts to use an index only scan to reduce database I/O. To use the optimization, the same requirements must be met as for keyset pagination:

ORDER BY clause is present.
The ORDER BY clause uniquely identifies one database column.
- Good, uses the primary key: ORDER BY id
- Bad, created_at not unique: ORDER BY created_at
- Good, there is a tie-breaker: ORDER BY created_at, id
The query is well-covered with a database index.

How to use the optimizator class

The optimizator class can be used with ActiveRecord::Relation objects, as a result, it will return an optimized, kaminari-paginated ActiveRecord::Relation object. In case the optimization cannot be applied, the original ActiveRecord::Relation object will be used for the pagination.

Basic usage:

scope = Issue.where(project_id: 1).order(:id)
records = Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan.new(scope: scope, page: 5, per_page: 100).paginate_with_kaminari
puts records.to_a

Optimizations should be always rolled out with feature flags, you can also target the usage of the optimization when certain conditions are met.

# - Only apply optimization for large page number lookups
# - When label_names filter parameter is given, the optimziation will not have effect (complex JOIN).
if params[:page] > 100 && params[:label_names].blank? && Feature.enabled?(:my_optimized_offet_query)
  Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan.new(scope: scope, page: params[:page], per_page: params[:per_page]).paginate_with_kaminari
else
  scope.page(params[:page]).per(params[:per_page])
end

How does the optimization work

The optimization takes the passed ActiveRecord::Relation object and moves it into a CTE. Within the CTE, the original query is altered to only select the ORDER BY columns. This will make it possible for the database to use index only scan.

When the query is executed, the query within the CTE is evaluated first, the CTE will contain LIMIT number of rows with the selected columns. Using the ORDER BY values, a LATERAL query will locate the full rows one by one. LATERAL query is used here in order to force out a nested loop: for each row in the CTE, look up a full row in the table.

Original query:

Reads OFFSET + LIMIT number of entries from the index.
Reads OFFSET + LIMIT number of rows from the table.

Optimized query:

Reads OFFSET + LIMIT number of entries from the index.
Reads LIMIT number of rows from the table.

Determine if the optimization helps

By running EXPLAIN (buffers, analyze) on the database query with a high (100_000) OFFSET value, we can clearly see if the optimization helps.

Look for the following:

The optimized query plan must have an index only scan node.
Comparing the cached buffer count and timing should be lower.
- This can be done when executing the same query 2 or 3 times.

Consider the following query:

SELECT issues.*
FROM issues
ORDER BY id
OFFSET 100000
LIMIT 100

It produces an execution plan which uses an index scan:

 Limit  (cost=27800.96..27828.77 rows=100 width=1491) (actual time=138.305..138.470 rows=100 loops=1)
   Buffers: shared hit=73212
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using issues_pkey on public.issues  (cost=0.57..26077453.90 rows=93802448 width=1491) (actual time=0.063..133.688 rows=100100 loops=1)
         Buffers: shared hit=73212
         I/O Timings: read=0.000 write=0.000

 
Time: 143.779 ms
  - planning: 5.222 ms
  - execution: 138.557 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 73212 (~572.00 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

The optimized query:

WITH index_only_scan_pagination_cte AS MATERIALIZED
  (SELECT id
   FROM issues
   ORDER BY id ASC
   LIMIT 100
   OFFSET 100000)
SELECT issues.*
FROM
  (SELECT id
   FROM index_only_scan_pagination_cte) index_only_scan_subquery,
     LATERAL
  (SELECT issues.*
   FROM issues
   WHERE issues.id = index_only_scan_subquery.id
   LIMIT 1) issues

Execution plan:

 Nested Loop  (cost=2453.51..2815.44 rows=100 width=1491) (actual time=23.614..23.973 rows=100 loops=1)
   Buffers: shared hit=56167
   I/O Timings: read=0.000 write=0.000
   CTE index_only_scan_pagination_cte
     ->  Limit  (cost=2450.49..2452.94 rows=100 width=4) (actual time=23.590..23.621 rows=100 loops=1)
           Buffers: shared hit=55667
           I/O Timings: read=0.000 write=0.000
           ->  Index Only Scan using issues_pkey on public.issues issues_1  (cost=0.57..2298090.72 rows=93802448 width=4) (actual time=0.070..20.412 rows=100100 loops=1)
                 Heap Fetches: 1063
                 Buffers: shared hit=55667
                 I/O Timings: read=0.000 write=0.000
   ->  CTE Scan on index_only_scan_pagination_cte  (cost=0.00..2.00 rows=100 width=4) (actual time=23.593..23.641 rows=100 loops=1)
         Buffers: shared hit=55667
         I/O Timings: read=0.000 write=0.000
   ->  Limit  (cost=0.57..3.58 rows=1 width=1491) (actual time=0.003..0.003 rows=1 loops=100)
         Buffers: shared hit=500
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using issues_pkey on public.issues  (cost=0.57..3.58 rows=1 width=1491) (actual time=0.003..0.003 rows=1 loops=100)
               Index Cond: (issues.id = index_only_scan_pagination_cte.id)
               Buffers: shared hit=500
               I/O Timings: read=0.000 write=0.000


Time: 29.562 ms
  - planning: 5.506 ms
  - execution: 24.056 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 56167 (~438.80 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0