Azure Data Factory Cookbook by Dmitry Anoshin Dmitry Foshin Roman Storchak and Xenia Ireton

Azure Data Factory Cookbook by Dmitry Anoshin Dmitry Foshin Roman Storchak and Xenia Ireton

Author:Dmitry Anoshin, Dmitry Foshin, Roman Storchak and Xenia Ireton
Language: eng
Format: epub
Publisher: Packt Publishing Ltd.
Published: 2020-12-23T00:00:00+00:00


How it works…

ADF can create new Databricks clusters or utilize existing ones. Leveraging a linked service, ADF connects to the external service and programmatically triggers the execution of Databricks notebooks and JAR and Python files.

You can create extremely complex pipelines using AFD and Databricks.

Building a machine learning app with Databricks and Azure Data Lake Storage

In addition to ETL/ELT jobs, data engineers often help data scientists to productionize machine learning applications. Using Databricks is an excellent way to simplify the work of the data scientist as well as create data preprocessing pipelines.

As we have seen in the previous recipe, ADF can trigger the execution of notebooks and JAR and Python files. So, parts of the app logic have to be encoded there.

A Databricks cluster uses its own filesystem (DBFS). So, we need to mount Azure Data Lake Storage to DBFS to access input data and the resulting files.

In this recipe, we will connect Azure Data Lake Storage to Databricks, ingest the MovieLens dataset, train a basic model for a recommender system, and store the model in Azure Data Lake Storage.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.