Google BigQuery by Valliappa Lakshmanan

Google BigQuery by Valliappa Lakshmanan

Author:Valliappa Lakshmanan
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2019-10-29T16:00:00+00:00

JOIN versus denormalization

What if we were to store the distance traveled in each trip in a denormalized table?

CREATE OR REPLACE TABLE ch07eu.cycle_hire AS SELECT start_station_name , end_station_name , ST_DISTANCE(ST_GeogPoint(s1.longitude, s1.latitude), ST_GeogPoint(s2.longitude, s2.latitude)) AS distance , duration FROM `bigquery-public-data`.london_bicycles.cycle_hire AS h JOIN `bigquery-public-data`.london_bicycles.cycle_stations AS s1 ON h.start_station_id = s1.id JOIN `bigquery-public-data`.london_bicycles.cycle_stations AS s2 ON h.end_station_id = s2.id

Querying this table returns results in 8.7 seconds and processes 1.6 GB—in other words, it’s 60% slower and about three times more expensivethan the previous query. In this instance, therefore, joining with a smaller table turns out to be more efficient than querying a larger, denormalized table. However, this is the sort of thing that you need to measure for your particular use case. You will see later how you can efficiently store data at differing levels of granularity in a single denormalized table with nested and repeated fields.


Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.