Mastering PostgreSQL 13 by Hans-Jürgen Schönig
Author:Hans-Jürgen Schönig [Hans-Jürgen Schönig]
Language: eng
Format: epub
Tags: COM021000 - COMPUTERS / Databases / General, COM021040 - COMPUTERS / Databases / Data Warehousing, COM018000 - COMPUTERS / Data Processing
Publisher: Packt Publishing
Published: 2020-11-12T11:24:20+00:00
Speeding up sorting
The work_mem variable doesn't only speed up grouping. It can also have a very nice impact on simple things such as sorting, which is an essential mechanism that's been mastered by every database system in the world.
The following query shows a simple operation using the default setting of 4 MB:
test=# SET work_mem TO default;
SET
test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id;
QUERY PLAN
--------------------------------------------------------------------------
Sort (cost=24111.14..24611.14 rows=200000 width=9)
(actual time=60.338..89.218 rows=200000 loops=1)
Sort Key: name, id
Sort Method: external merge Disk: 6912kB
-> Seq Scan on t_test (cost=0.00..3082.00 rows=200000 width=9)
(actual time=0.006..17.818 rows=200000 loops=1)
Planning Time: 0.074 ms
Execution Time: 162.544 ms
(6 rows)
PostgreSQL needs 17.8 milliseconds to read the data and over 70 milliseconds to sort it. Due to the low amount of memory available, sorting has to be performed using temporary files. The external merge Disk method only requires small amounts of RAM, but has to send intermediate data to a comparatively slow storage device, which, of course, leads to poor throughput.
Increasing the work_mem variable setting will make PostgreSQL use more memory for sorting:
test=# SET work_mem TO '1 GB';
SET test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id;
QUERY PLAN
-----------------------------------------------------------------
Sort (cost=20691.64..21191.64 rows=200000 width=9)
(actual time=36.481..47.899 rows=200000 loops=1)
Sort Key: name, id
Sort Method: quicksort Memory: 15520kB
-> Seq Scan on t_test
(cost=0.00..3082.00 rows=200000 width=9)
(actual time=0.010..14.232 rows=200000 loops=1)
Planning time: 0.037 ms
Execution time: 55.520 ms
(6 rows)
Since there is enough memory now, the database will do all the sorting in memory and therefore speed up the process dramatically. The sort takes just 33 milliseconds now, which is a seven-times improvement compared to the query we had previously. More memory will lead to faster sorting and will speed up the system.
So far, you have seen two mechanisms that can be used to sort data: external merge Disk and quicksort Memory. In addition to these two mechanisms, there is also a third algorithm, that is, top-N heapsort Memory. It can be used to provide you with only the top-N rows:
test=# EXPLAIN ANALYZE SELECT * FROM t_test ORDER BY name, id LIMIT 10; QUERY PLAN
-----------------------------------------------------------------
Limit (cost=7403.93..7403.95 rows=10 width=9)
(actual time=31.837..31.838 rows=10 loops=1)
-> Sort (cost=7403.93..7903.93 rows=200000 width=9)
(actual time=31.836..31.837 rows=10 loops=1)
Sort Key: name, id
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t_test
(cost=0.00..3082.00 rows=200000 width=9)
(actual time=0.011..13.645 rows=200000 loops=1)
Planning time: 0.053 ms
Execution time: 31.856 ms
(7 rows)
The algorithm is lightning fast, and the entire query will be done in just over 30 milliseconds. The sorting part is now only 18 milliseconds and is therefore almost as fast as reading the data in the first place.
In PostgreSQL 13, a new algorithm has been added:
test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX
test=# explain analyze SELECT * FROM t_test ORDER BY id, name;
QUERY PLAN
------------------------------------------------------------------------------------
Incremental Sort (cost=0.46..15289.42 rows=200000 width=9)
(actual time=0.047..71.622 rows=200000 loops=1)
Sort Key: id, name
Presorted Key: id
Full-sort Groups: 6250 Sort Method: quicksort
Average Memory: 26kB Peak Memory: 26kB
-> Index Scan using idx_id on t_test (cost=0.42..6289.42 rows=200000 width=9)
(actual time=0.032..37.965 rows=200000 loops=1)
Planning Time: 0.165 ms
Execution Time: 83.681 ms
(7 rows)
Incremental sort is used if data is already sorted by some variables. In this case, idx_id will return data sorted by id. All we have to do is to sort the already sorted data by name.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7808)
Grails in Action by Glen Smith Peter Ledbrook(7719)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6818)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6817)
Running Windows Containers on AWS by Marcio Morales(6336)
Kotlin in Action by Dmitry Jemerov(5090)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(5057)
Combating Crime on the Dark Web by Nearchos Nearchou(4631)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4588)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4437)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4321)
The Age of Surveillance Capitalism by Shoshana Zuboff(3979)
Python for Security and Networking - Third Edition by José Manuel Ortega(3883)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3541)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3528)
Learn Wireshark by Lisa Bock(3505)
Mastering Python for Networking and Security by José Manuel Ortega(3376)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3355)
Blockchain Basics by Daniel Drescher(3324)
