Offset pagination optimization
In many REST APIs endpoints, we use offset-based pagination which uses the page
URL parameter to paginate through the results. Offset pagination scales linearly, the higher the page number, the slower the database query gets. This means that for large page numbers, the database query can time out. This usually happens when third-party integrations and scripts interact with the system as users are unlikely to deliberately visit a high page number.
The ideal way of dealing with scalability problems related to offset pagination is switching to keyset pagination however, this is means a breaking API change. To provide a temporary, stop-gap measure, you can use the Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan
class. The optimization can help in certain cases to improve the performance of offset-paginated queries when high OFFSET
value is present. The performance improvement means that the queries will continue to scale linearly with improved query timings, which means that reaching database timeout will happen at a much higher page
number if it happens at all.
Requirements for using the optimization
The optimization avoids calling SELECT *
when determining the records based on the ORDER BY
, OFFSET
, and LIMIT
clauses and attempts to use an index only scan to reduce database I/O. To use the optimization, the same requirements must be met as for keyset pagination:
-
ORDER BY
clause is present. - The
ORDER BY
clause uniquely identifies one database column.- Good, uses the primary key:
ORDER BY id
- Bad,
created_at
not unique:ORDER BY created_at
- Good, there is a tie-breaker:
ORDER BY created_at, id
- Good, uses the primary key:
- The query is well-covered with a database index.
How to use the optimizator class
The optimizator class can be used with ActiveRecord::Relation
objects, as a result, it will return an optimized, kaminari-paginated ActiveRecord::Relation
object. In case the optimization cannot be applied, the original ActiveRecord::Relation
object will be used for the pagination.
Basic usage:
scope = Issue.where(project_id: 1).order(:id)
records = Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan.new(scope: scope, page: 5, per_page: 100).paginate_with_kaminari
puts records.to_a
Optimizations should be always rolled out with feature flags, you can also target the usage of the optimization when certain conditions are met.
# - Only apply optimization for large page number lookups
# - When label_names filter parameter is given, the optimziation will not have effect (complex JOIN).
if params[:page] > 100 && params[:label_names].blank? && Feature.enabled?(:my_optimized_offet_query)
Gitlab::Pagination::Offset::PaginationWithIndexOnlyScan.new(scope: scope, page: params[:page], per_page: params[:per_page]).paginate_with_kaminari
else
scope.page(params[:page]).per(params[:per_page])
end
How does the optimization work
The optimization takes the passed ActiveRecord::Relation
object and moves it into a CTE. Within the CTE, the original query is altered to only
select the ORDER BY
columns. This will make it possible for the database to use index only scan.
When the query is executed, the query within the CTE is evaluated first, the CTE will contain LIMIT
number of rows with the selected columns.
Using the ORDER BY
values, a LATERAL
query will locate the full rows one by one. LATERAL
query is used here in order to force out
a nested loop: for each row in the CTE, look up a full row in the table.
Original query:
- Reads
OFFSET + LIMIT
number of entries from the index. - Reads
OFFSET + LIMIT
number of rows from the table.
Optimized query:
- Reads
OFFSET + LIMIT
number of entries from the index. - Reads
LIMIT
number of rows from the table.
Determine if the optimization helps
By running EXPLAIN (buffers, analyze)
on the database query with a high (100_000) OFFSET
value, we can clearly see if the optimization helps.
Look for the following:
- The optimized query plan must have an index only scan node.
- Comparing the cached buffer count and timing should be lower.
- This can be done when executing the same query 2 or 3 times.
Consider the following query:
SELECT issues.*
FROM issues
ORDER BY id
OFFSET 100000
LIMIT 100
It produces an execution plan which uses an index scan:
Limit (cost=27800.96..27828.77 rows=100 width=1491) (actual time=138.305..138.470 rows=100 loops=1)
Buffers: shared hit=73212
I/O Timings: read=0.000 write=0.000
-> Index Scan using issues_pkey on public.issues (cost=0.57..26077453.90 rows=93802448 width=1491) (actual time=0.063..133.688 rows=100100 loops=1)
Buffers: shared hit=73212
I/O Timings: read=0.000 write=0.000
Time: 143.779 ms
- planning: 5.222 ms
- execution: 138.557 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 73212 (~572.00 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
The optimized query:
WITH index_only_scan_pagination_cte AS MATERIALIZED
(SELECT id
FROM issues
ORDER BY id ASC
LIMIT 100
OFFSET 100000)
SELECT issues.*
FROM
(SELECT id
FROM index_only_scan_pagination_cte) index_only_scan_subquery,
LATERAL
(SELECT issues.*
FROM issues
WHERE issues.id = index_only_scan_subquery.id
LIMIT 1) issues
Execution plan:
Nested Loop (cost=2453.51..2815.44 rows=100 width=1491) (actual time=23.614..23.973 rows=100 loops=1)
Buffers: shared hit=56167
I/O Timings: read=0.000 write=0.000
CTE index_only_scan_pagination_cte
-> Limit (cost=2450.49..2452.94 rows=100 width=4) (actual time=23.590..23.621 rows=100 loops=1)
Buffers: shared hit=55667
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using issues_pkey on public.issues issues_1 (cost=0.57..2298090.72 rows=93802448 width=4) (actual time=0.070..20.412 rows=100100 loops=1)
Heap Fetches: 1063
Buffers: shared hit=55667
I/O Timings: read=0.000 write=0.000
-> CTE Scan on index_only_scan_pagination_cte (cost=0.00..2.00 rows=100 width=4) (actual time=23.593..23.641 rows=100 loops=1)
Buffers: shared hit=55667
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.57..3.58 rows=1 width=1491) (actual time=0.003..0.003 rows=1 loops=100)
Buffers: shared hit=500
I/O Timings: read=0.000 write=0.000
-> Index Scan using issues_pkey on public.issues (cost=0.57..3.58 rows=1 width=1491) (actual time=0.003..0.003 rows=1 loops=100)
Index Cond: (issues.id = index_only_scan_pagination_cte.id)
Buffers: shared hit=500
I/O Timings: read=0.000 write=0.000
Time: 29.562 ms
- planning: 5.506 ms
- execution: 24.056 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 56167 (~438.80 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0