The Kaggle Workbook by Konrad Banachewicz & Luca Massaron
Author:Konrad Banachewicz & Luca Massaron
Language: eng
Format: epub
Publisher: Packt
Published: 2023-12-15T00:00:00+00:00
In addition, the decimal part of the price is processed as a feature, in order to reveal a situation when the item is sold at psychological pricing thresholds (e.g., $19.99 or £2.98 â see this discussion: https://www.kaggle.com/competitions/m5-forecasting-accuracy/discussion/145011).
The function math.modf (https://docs.python.org/3.8/library/math.html#math.modf) helps in doing so because it splits any floating-point number into fractional and integer parts (a two-item tuple).
Finally, the resulting table is saved onto disk.
Here is the function doing all the feature engineering on prices:
def generate_grid_price(prices_df, calendar_df, end_train_day_x, predict_horizon): grid_df = pd.read_feather(f"grid_df_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") prices_df['price_max'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('max') prices_df['price_min'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('min') prices_df['price_std'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('std') prices_df['price_mean'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('mean') prices_df['price_norm'] = prices_df['sell_price'] / prices_df['price_max'] prices_df['price_nunique'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('nunique') prices_df['item_nunique'] = prices_df.groupby(['store_id', 'sell_price'])['item_id'].transform('nunique') calendar_prices = calendar_df[['wm_yr_wk', 'month', 'year']] calendar_prices = calendar_prices.drop_duplicates(subset=['wm_yr_wk']) prices_df = prices_df.merge(calendar_prices[['wm_yr_wk', 'month', 'year']], on=['wm_yr_wk'], how='left') del calendar_prices gc.collect() prices_df['price_momentum'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id'])[ 'sell_price'].transform(lambda x: x.shift(1)) prices_df['price_momentum_m'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id', 'month'])[ 'sell_price'].transform('mean') prices_df['price_momentum_y'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id', 'year'])[ 'sell_price'].transform('mean') prices_df['sell_price_cent'] = [math.modf(p)[0] for p in prices_df['sell_price']] prices_df['price_max_cent'] = [math.modf(p)[0] for p in prices_df['price_max']] prices_df['price_min_cent'] = [math.modf(p)[0] for p in prices_df['price_min']] del prices_df['month'], prices_df['year'] prices_df = reduce_mem_usage(prices_df, verbose=False) gc.collect() original_columns = list(grid_df) grid_df = grid_df.merge(prices_df, on=['store_id', 'item_id', 'wm_yr_wk'], how='left') del(prices_df) gc.collect() keep_columns = [col for col in list(grid_df) if col not in original_columns] grid_df = grid_df[['id', 'd'] + keep_columns] grid_df = reduce_mem_usage(grid_df, verbose=False) grid_df.to_feather(f"grid_price_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") del(grid_df) gc.collect()
The next function computes the moon phase, returning one of its eight phases (from new moon to waning crescent). Although moon phases shouldnât directly influence any sales (weather conditions instead do, but we have no weather information in the data), they represent a periodic cycle of 29 and a half days, which can well suit periodic shopping behaviors.
There is an interesting discussion, with different hypotheses regarding why moon phases may work as a predictor, in this competition post: https://www.kaggle.com/competitions/m5-forecasting-accuracy/discussion/154776:
def get_moon_phase(d): # 0=new, 4=full; 4 days/phase diff = datetime.datetime.strptime(d, '%Y-%m-%d') - datetime.datetime(2001, 1, 1) days = dec(diff.days) + (dec(diff.seconds) / dec(86400)) lunations = dec("0.20439731") + (days * dec("0.03386319269")) phase_index = math.floor((lunations % dec(1) * dec(8)) + dec('0.5')) return int(phase_index) & 7
The moon phase function is part of a general function for creating time-based features. The function takes the calendar dataset information and places it among the features. Such information contains events and their type as well as an indication of the SNAP periods that could drive furthermore sales of basic goods. The function also generates numeric features such as the day, the month, the year, the day of the week, the week in the month, and if it is the end of the week. Here is the code:
def generate_grid_calendar(calendar_df, end_train_day_x, predict_horizon): grid_df = pd.read_feather( f"grid_df_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") grid_df = grid_df[['id', 'd']] gc.collect() calendar_df['moon'] = calendar_df.date.apply(get_moon_phase) # Merge calendar partly icols = ['date', 'd', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'snap_CA', 'snap_TX', 'snap_WI', 'moon', ] grid_df = grid_df.merge(calendar_df[icols], on=['d'], how='left') icols = ['event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'snap_CA', 'snap_TX', 'snap_WI'] for col in icols: grid_df[col] = grid_df[col].astype('category') grid_df['date'] = pd.to_datetime(grid_df['date']) grid_df['tm_d'] = grid_df['date'].dt.day.astype(np.int8) grid_df['tm_w'] = grid_df['date'].dt.isocalendar().week.astype(np.int8) grid_df['tm_m'] = grid_df['date'].dt.month.astype(np.int8) grid_df['tm_y'] = grid_df['date'].dt.year grid_df['tm_y'] = (grid_df['tm_y'] - grid_df['tm_y'].
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8331)
Test-Driven Development with Java by Alan Mellor(6977)
Data Augmentation with Python by Duc Haba(6898)
Principles of Data Fabric by Sonia Mezzetta(6634)
Learn Blender Simulations the Right Way by Stephen Pearson(6543)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6407)
Hadoop in Practice by Alex Holmes(5973)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5827)
RPA Solution Architect's Handbook by Sachin Sahgal(5799)
The Infinite Retina by Robert Scoble Irena Cronin(5497)
Big Data Analysis with Python by Ivan Marin(5487)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5181)
Pretrain Vision and Large Language Models in Python by Emily Webber(4447)
Infrastructure as Code for Beginners by Russ McKendrick(4232)
Functional Programming in JavaScript by Mantyla Dan(4056)
The Age of Surveillance Capitalism by Shoshana Zuboff(3977)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3941)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3743)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3714)
