Data Science by Daniel Vaughan
Author:Daniel Vaughan
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2023-11-14T00:00:00+00:00
Implementing the Windowing Methodology
Once you have defined them, you can enforce these on your code with something like the following snippet:
import datetime from dateutil.relativedelta import relativedelta def query_data(len_obs: int, len_pre: int): """ Function to query the data enforcing the chosen time windows. Requires a connection to the company's database Args: len_obs (int): Length in months for observation window (O). len_pre (int): Length in months for prediction window (P). Returns: df: Pandas DataFrame with data for training the model. """ # set the time variables today = datetime.datetime.today() base_time = today - relativedelta(months = len_pre) # t_p - P init_time = base_time - relativedelta(months = len_obs) end_time = base_time + relativedelta(months = len_pre) init_str = init_time.strftime('%Y-%m-%d') base_str = base_time.strftime('%Y-%m-%d') end_str = end_time.strftime('%Y-%m-%d') # print to check that things make sense print(f'Observation window (O={len_obs}): [{init_str}, {base_str})') print(f'Prediction window (P={len_pre}): [{base_str}, {end_str}]') # create query my_query = f""" SELECT SUM(CASE WHEN date >= '{init_str}' AND date < '{base_str}' THEN x_metric ELSE 0 END) AS my_feature, SUM(CASE WHEN date >= '{base_str}' AND date <= '{end_str}' THEN y_metric ELSE 0 END) AS my_outcome FROM my_table """ print(my_query) # connect to database and bring in the data # will throw an error since the method doesn't exist df = connect_to_database(my_query, conn_parameters) return df
Summing up, the window methodology helps you enforce a minimal requirement that you can only use the past to predict the future. Other causes of data leakage may still be present.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Implementing Enterprise Observability for Success by Manisha Agrawal and Karun Krishnannair(7437)
Supercharging Productivity with Trello by Brittany Joiner(6697)
Mastering Tableau 2023 - Fourth Edition by Marleen Meier(6462)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6427)
Inkscape by Example by István Szép(6317)
Visualize Complex Processes with Microsoft Visio by David J Parker & Šenaj Lelić(6011)
Build Stunning Real-time VFX with Unreal Engine 5 by Hrishikesh Andurlekar(5015)
Design Made Easy with Inkscape by Christopher Rogers(4654)
Customizing Microsoft Teams by Gopi Kondameda(4190)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3942)
Business Intelligence Career Master Plan by Eduardo Chavez & Danny Moncada(3801)
Extending Microsoft Power Apps with Power Apps Component Framework by Danish Naglekar(3780)
Salesforce Platform Enterprise Architecture - Fourth Edition by Andrew Fawcett(3659)
Pandas Cookbook by Theodore Petrou(3635)
The Tableau Workshop by Sumit Gupta Sylvester Pinto Shweta Sankhe-Savale JC Gillet and Kenneth Michael Cherven(3433)
TCP IP by Todd Lammle(2995)
Drawing Shortcuts: Developing Quick Drawing Skills Using Today's Technology by Leggitt Jim(2926)
Exploring Microsoft Excel's Hidden Treasures by David Ringstrom(2906)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2887)
