Azure Storage, Streaming, and Batch Analytics: A guide for data engineers by Richard L. Nuckolls

Azure Storage, Streaming, and Batch Analytics: A guide for data engineers by Richard L. Nuckolls

Author:Richard L. Nuckolls [Nuckolls, Richard L.]
Language: eng
Format: epub
Publisher: Manning Publications Co.
Published: 2020-10-06T22:00:00+00:00


Writing to a U-SQL database table

What happens if you load data to a partitioned table, but haven’t created partitions for all the key values? The ADLA job will fail. Partitions for the key values must be created manually. There must be at least one partition added to the table before you can load data to a table created for partitioning. If there isn’t a matching partition for all data rows, U-SQL provides an option for INSERT statements to handle this. You can drop the rows that don’t match, or provide a catch-all partition. Add ON INTEGRITY VIOLATION IGNORE after the field definition to drop the row, or ON INTEGRITY VIOLATION MOVE TO PARTITION ([partition]) to write the row to the selected partition. Include a partition for unmatched values when you create the initial partitions. You must set a value for this partition but it doesn’t matter what you choose, as long as it isn’t in the partition key value set. The following listing shows an example for adding an unmatched key partition to the SensorData table.

Listing 8.13 Add unmatched partition to U-SQL table

USE DATABASE Players; DECLARE @partitionx string = "playerx"; ALTER TABLE SensorData ADD IF NOT EXISTS PARTITION (@partitionx);

With this extra partition, if you add ON INTEGRITY VIOLATION MOVE TO PARTITION ("playerx") to the INSERT statement, Player field data that doesn’t match an existing partition will be loaded into the playerx partition.

Listing 8.14 Using INTEGRITY VIOLATION MOVE in a U-SQL table

USE DATABASE Players; @sensors = EXTRACT Id Guid, Player string, Node int, NodeType string, NodeValue decimal, EventTime DateTime, PartitionId int, EventEnqueuedUtcTime DateTime, EventProcessedUtcTime DateTime FROM "/Staging/Sensor/v2/sensor_{*}.csv" USING Extractors.Csv(skipFirstNRows: 1); INSERT INTO SensorData (Id,Player,Node,NodeType,NodeValue,EventTime, ➥ PartitionId,EventEnqueuedUtcTime,EventProcessedUtcTime) ON INTEGRITY VIOLATION MOVE TO PARTITION ("playerx") SELECT * FROM sensors;

Inserting data into U-SQL tables updates the clustered index.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.