Delta Lake by Bennie Haelen

Delta Lake by Bennie Haelen

Author:Bennie Haelen
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2023-10-23T00:00:00+00:00


Time Travel Under the Hood

Version history can be kept on a Delta table because the transaction log keeps track of which files should or should not be read when performing operations on a table. When the DESCRIBE HISTORY command is executed, it will also return the operationMetrics, which tells you the number of files added and removed during an operation. When performing an UPDATE, DELETE, or MERGE on a table, that data is not physically removed from the underlying storage. Rather, these operations update the transaction log to indicate which files should or should not be read. Similarly, when you restore a table to a previous version, it does not physically add or remove data; it only updates the metadata in the transaction log to tell it which files to read.

In Chapter 2 you learned about JSON files within the _delta_log directory and checkpoint files. Checkpoint files save the state of the entire table at a point in time, and are automatically generated to maintain read performance by combining JSON commits into Parquet files. The checkpoint file and subsequent commits can then be read to get the current state, and previous states in the case of time travel, of the table, avoiding the need to list and reprocess all of the commits.

The transaction log commits checkpoint files, and the fact that data files are only logically removed as opposed to being physically removed is the foundation for how Delta Lake easily enables time travel on your Delta table. Figure 6-3 shows the transaction log entries for each of the operations on the taxidb.tripData table throughout the different transactions and versions.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.