Automated Machine Learning with Microsoft Azure by Dennis Michael Sawyers

Automated Machine Learning with Microsoft Azure by Dennis Michael Sawyers

Author:Dennis Michael Sawyers
Language: eng
Format: epub
Publisher: Packt Publishing Pvt Ltd
Published: 2021-03-26T00:00:00+00:00


You now have the OJ Sales data prepped for the accelerator. In order to bring your own data into the accelerator, there are a few important caveats you need to follow. Most importantly, the OJ Sales data comes presplit based on store and orange juice brand. You will need to mimic this structure using your own data in a new Jupyter notebook.

Prepping a pandas dataframe

Bringing your own data into the MMSA is unclear. OJ Sales, after all, is a file dataset consisting of 11,793 files. You are much more likely to use data that consists of a single file or comes from a single table within a database. Moreover, you are most likely to read it in via pandas, the most common Python package. To learn how to use pandas dataframes with the MMSA, perform the following steps:

Download the ManyModelsSampleData.csv file from the Automated-Machine-Learning-on-Microsoft-Azure GitHub repository.

Navigate to your Jupyter environment.

Open the solution-accelerator-many-models folder.

Click the Upload button in the top-left corner of your screen. Upload the ManyModelsSampleData.csv file to your Jupyter environment.

Create a new Jupyter notebook and open it. Rename it 01_Data_PreparationMy-Data.ipynb.

To load in all of the libraries, you will require the following code:import pandas as pd

import numpy as np

import os

import datetime as dt

from azureml.core import Workspace, Dataset, Datastore

from scripts.helper import split_data

You should recognize pandas, numpy, Workspace, Dataset, and Datastore from Chapter 4, Building an AutoML Regression Solution. You've also used os in Chapter 6, Building an AutoML Forecasting Solution.

New to this script is split_data, which is a helper function. Helper functions are reusable functions written for a program to reduce complexity. The MMSA has a few helper functions and split data is used to divide data into training and inference data based on a date you pass in.

Another new package is datetime, which lets you convert string objects into proper Python datetime objects. This is a requirement since split_data requires datetime objects to function properly.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.