Data Science on the Google Cloud Platform by Valliappa Lakshmanan

Data Science on the Google Cloud Platform by Valliappa Lakshmanan

Author:Valliappa Lakshmanan
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2017-12-29T05:00:00+00:00


traindays = spark.read \ .option("header", "true") \ .option("inferSchema", "true") \ .csv('gs://cloud-training-demos-ml/flights/trainday.csv') traindays.createOrReplaceTempView('traindays')

A quick check illustrates that traindays has been read, and the column names and types are correct:

results = spark.sql('SELECT * FROM traindays') results.head(5)

This yields the following:

[Row(FL_DATE=datetime.datetime(2015, 1, 1, 0, 0), is_train_day=True), Row(FL_DATE=datetime.datetime(2015, 1, 2, 0, 0), is_train_day=False), Row(FL_DATE=datetime.datetime(2015, 1, 3, 0, 0), is_train_day=False), Row(FL_DATE=datetime.datetime(2015, 1, 4, 0, 0), is_train_day=True), Row(FL_DATE=datetime.datetime(2015, 1, 5, 0, 0), is_train_day=True)]

To restrict the flights dataframe to contain only training days, we can do a SQL join:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.