Scaling Python with Dask by Holden Karau & Mika Kimmins

Scaling Python with Dask by Holden Karau & Mika Kimmins

Author:Holden Karau & Mika Kimmins [Holden Karau]
Language: eng
Format: epub
Publisher: O'Reilly Media, Inc.
Published: 2023-07-25T00:00:00+00:00


Example 4-9. Dask custom aggregate

# Write a custom weighted mean, we get either a DataFrameGroupBy # with multiple columns or SeriesGroupBy for each chunk def process_chunk(chunk): def weighted_func(df): return (df["EmployerSize"] * df["DiffMeanHourlyPercent"]).sum() return (chunk.apply(weighted_func), chunk.sum()["EmployerSize"]) def agg(total, weights): return (total.sum(), weights.sum()) def finalize(total, weights): return total / weights weighted_mean = dd.Aggregation( name='weighted_mean', chunk=process_chunk, agg=agg, finalize=finalize) aggregated = (df_diff_with_emp_size.groupby("PostCode") ["EmployerSize", "DiffMeanHourlyPercent"].agg(weighted_mean))



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.