Mastering PostgreSQL 13 by Hans-Jürgen Schönig

Mastering PostgreSQL 13 by Hans-Jürgen Schönig

Author:Hans-Jürgen Schönig [Hans-Jürgen Schönig]
Language: eng
Format: epub
Tags: COM021000 - COMPUTERS / Databases / General, COM021040 - COMPUTERS / Databases / Data Warehousing, COM018000 - COMPUTERS / Data Processing
Publisher: Packt Publishing
Published: 2020-11-12T11:24:20+00:00


Speeding up sorting

The work_mem variable doesn't only speed up grouping. It can also have a very nice impact on simple things such as sorting, which is an essential mechanism that's been mastered by every database system in the world.

The following query shows a simple operation using the default setting of 4 MB:

test=# SET work_mem TO default;

SET

test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id;

QUERY PLAN

--------------------------------------------------------------------------

Sort (cost=24111.14..24611.14 rows=200000 width=9)

(actual time=60.338..89.218 rows=200000 loops=1)

Sort Key: name, id

Sort Method: external merge Disk: 6912kB

-> Seq Scan on t_test (cost=0.00..3082.00 rows=200000 width=9)

(actual time=0.006..17.818 rows=200000 loops=1)

Planning Time: 0.074 ms

Execution Time: 162.544 ms

(6 rows)

PostgreSQL needs 17.8 milliseconds to read the data and over 70 milliseconds to sort it. Due to the low amount of memory available, sorting has to be performed using temporary files. The external merge Disk method only requires small amounts of RAM, but has to send intermediate data to a comparatively slow storage device, which, of course, leads to poor throughput.

Increasing the work_mem variable setting will make PostgreSQL use more memory for sorting:

test=# SET work_mem TO '1 GB';

SET test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id;

QUERY PLAN

-----------------------------------------------------------------

Sort (cost=20691.64..21191.64 rows=200000 width=9)

(actual time=36.481..47.899 rows=200000 loops=1)

Sort Key: name, id

Sort Method: quicksort Memory: 15520kB

-> Seq Scan on t_test

(cost=0.00..3082.00 rows=200000 width=9)

(actual time=0.010..14.232 rows=200000 loops=1)

Planning time: 0.037 ms

Execution time: 55.520 ms

(6 rows)

Since there is enough memory now, the database will do all the sorting in memory and therefore speed up the process dramatically. The sort takes just 33 milliseconds now, which is a seven-times improvement compared to the query we had previously. More memory will lead to faster sorting and will speed up the system.

So far, you have seen two mechanisms that can be used to sort data: external merge Disk and quicksort Memory. In addition to these two mechanisms, there is also a third algorithm, that is, top-N heapsort Memory. It can be used to provide you with only the top-N rows:

test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id LIMIT 10; QUERY PLAN

-----------------------------------------------------------------

Limit (cost=7403.93..7403.95 rows=10 width=9)

(actual time=31.837..31.838 rows=10 loops=1)

-> Sort (cost=7403.93..7903.93 rows=200000 width=9)

(actual time=31.836..31.837 rows=10 loops=1)

Sort Key: name, id

Sort Method: top-N heapsort Memory: 25kB

-> Seq Scan on t_test

(cost=0.00..3082.00 rows=200000 width=9)

(actual time=0.011..13.645 rows=200000 loops=1)

Planning time: 0.053 ms

Execution time: 31.856 ms

(7 rows)

The algorithm is lightning fast, and the entire query will be done in just over 30 milliseconds. The sorting part is now only 18 milliseconds and is therefore almost as fast as reading the data in the first place.

In PostgreSQL 13, a new algorithm has been added:

test=# CREATE INDEX idx_id ON t_test (id);

CREATE INDEX

test=# explain analyze SELECT * FROM t_test ORDER BY id, name;

QUERY PLAN

------------------------------------------------------------------------------------

Incremental Sort (cost=0.46..15289.42 rows=200000 width=9)

(actual time=0.047..71.622 rows=200000 loops=1)

Sort Key: id, name

Presorted Key: id

Full-sort Groups: 6250 Sort Method: quicksort

Average Memory: 26kB Peak Memory: 26kB

-> Index Scan using idx_id on t_test (cost=0.42..6289.42 rows=200000 width=9)

(actual time=0.032..37.965 rows=200000 loops=1)

Planning Time: 0.165 ms

Execution Time: 83.681 ms

(7 rows)

Incremental sort is used if data is already sorted by some variables. In this case, idx_id will return data sorted by id. All we have to do is to sort the already sorted data by name.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.