The Kaggle Workbook by Konrad Banachewicz & Luca Massaron
Author:Konrad Banachewicz & Luca Massaron
Language: eng
Format: epub
Publisher: Packt
Published: 2023-12-15T00:00:00+00:00
In addition, the decimal part of the price is processed as a feature, in order to reveal a situation when the item is sold at psychological pricing thresholds (e.g., $19.99 or £2.98 â see this discussion: https://www.kaggle.com/competitions/m5-forecasting-accuracy/discussion/145011).
The function math.modf (https://docs.python.org/3.8/library/math.html#math.modf) helps in doing so because it splits any floating-point number into fractional and integer parts (a two-item tuple).
Finally, the resulting table is saved onto disk.
Here is the function doing all the feature engineering on prices:
def generate_grid_price(prices_df, calendar_df, end_train_day_x, predict_horizon): grid_df = pd.read_feather(f"grid_df_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") prices_df['price_max'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('max') prices_df['price_min'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('min') prices_df['price_std'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('std') prices_df['price_mean'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('mean') prices_df['price_norm'] = prices_df['sell_price'] / prices_df['price_max'] prices_df['price_nunique'] = prices_df.groupby(['store_id', 'item_id'])['sell_price'].transform('nunique') prices_df['item_nunique'] = prices_df.groupby(['store_id', 'sell_price'])['item_id'].transform('nunique') calendar_prices = calendar_df[['wm_yr_wk', 'month', 'year']] calendar_prices = calendar_prices.drop_duplicates(subset=['wm_yr_wk']) prices_df = prices_df.merge(calendar_prices[['wm_yr_wk', 'month', 'year']], on=['wm_yr_wk'], how='left') del calendar_prices gc.collect() prices_df['price_momentum'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id'])[ 'sell_price'].transform(lambda x: x.shift(1)) prices_df['price_momentum_m'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id', 'month'])[ 'sell_price'].transform('mean') prices_df['price_momentum_y'] = prices_df['sell_price'] / prices_df.groupby(['store_id', 'item_id', 'year'])[ 'sell_price'].transform('mean') prices_df['sell_price_cent'] = [math.modf(p)[0] for p in prices_df['sell_price']] prices_df['price_max_cent'] = [math.modf(p)[0] for p in prices_df['price_max']] prices_df['price_min_cent'] = [math.modf(p)[0] for p in prices_df['price_min']] del prices_df['month'], prices_df['year'] prices_df = reduce_mem_usage(prices_df, verbose=False) gc.collect() original_columns = list(grid_df) grid_df = grid_df.merge(prices_df, on=['store_id', 'item_id', 'wm_yr_wk'], how='left') del(prices_df) gc.collect() keep_columns = [col for col in list(grid_df) if col not in original_columns] grid_df = grid_df[['id', 'd'] + keep_columns] grid_df = reduce_mem_usage(grid_df, verbose=False) grid_df.to_feather(f"grid_price_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") del(grid_df) gc.collect()
The next function computes the moon phase, returning one of its eight phases (from new moon to waning crescent). Although moon phases shouldnât directly influence any sales (weather conditions instead do, but we have no weather information in the data), they represent a periodic cycle of 29 and a half days, which can well suit periodic shopping behaviors.
There is an interesting discussion, with different hypotheses regarding why moon phases may work as a predictor, in this competition post: https://www.kaggle.com/competitions/m5-forecasting-accuracy/discussion/154776:
def get_moon_phase(d): # 0=new, 4=full; 4 days/phase diff = datetime.datetime.strptime(d, '%Y-%m-%d') - datetime.datetime(2001, 1, 1) days = dec(diff.days) + (dec(diff.seconds) / dec(86400)) lunations = dec("0.20439731") + (days * dec("0.03386319269")) phase_index = math.floor((lunations % dec(1) * dec(8)) + dec('0.5')) return int(phase_index) & 7
The moon phase function is part of a general function for creating time-based features. The function takes the calendar dataset information and places it among the features. Such information contains events and their type as well as an indication of the SNAP periods that could drive furthermore sales of basic goods. The function also generates numeric features such as the day, the month, the year, the day of the week, the week in the month, and if it is the end of the week. Here is the code:
def generate_grid_calendar(calendar_df, end_train_day_x, predict_horizon): grid_df = pd.read_feather( f"grid_df_{end_train_day_x}_to_{end_train_day_x + predict_horizon}.feather") grid_df = grid_df[['id', 'd']] gc.collect() calendar_df['moon'] = calendar_df.date.apply(get_moon_phase) # Merge calendar partly icols = ['date', 'd', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'snap_CA', 'snap_TX', 'snap_WI', 'moon', ] grid_df = grid_df.merge(calendar_df[icols], on=['d'], how='left') icols = ['event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'snap_CA', 'snap_TX', 'snap_WI'] for col in icols: grid_df[col] = grid_df[col].astype('category') grid_df['date'] = pd.to_datetime(grid_df['date']) grid_df['tm_d'] = grid_df['date'].dt.day.astype(np.int8) grid_df['tm_w'] = grid_df['date'].dt.isocalendar().week.astype(np.int8) grid_df['tm_m'] = grid_df['date'].dt.month.astype(np.int8) grid_df['tm_y'] = grid_df['date'].dt.year grid_df['tm_y'] = (grid_df['tm_y'] - grid_df['tm_y'].
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7858)
Hadoop in Practice by Alex Holmes(5664)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5517)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(4520)
Functional Programming in JavaScript by Mantyla Dan(3726)
The Age of Surveillance Capitalism by Shoshana Zuboff(3432)
Big Data Analysis with Python by Ivan Marin(3088)
Blockchain Basics by Daniel Drescher(2896)
The Rosie Effect by Graeme Simsion(2716)
WordPress Plugin Development Cookbook by Yannick Lefebvre(2625)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2540)
Test-Driven Development with Java by Alan Mellor(2533)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2487)
Dawn of the New Everything by Jaron Lanier(2441)
Data Augmentation with Python by Duc Haba(2381)
The Art Of Deception by Kevin Mitnick(2303)
The Infinite Retina by Robert Scoble Irena Cronin(2256)
Rapid Viz: A New Method for the Rapid Visualization of Ideas by Kurt Hanks & Larry Belliston(2200)
Principles of Data Fabric by Sonia Mezzetta(2195)