Effective Data Science Infrastructure by Ville Tuulos
Author:Ville Tuulos [Tuulos, Ville]
Language: eng
Format: epub, mobi, pdf
Publisher: Manning Publications Co.
Published: 2022-07-08T22:00:00+00:00
Figure 5.9 Execution time vs. the number CPU cores in the multithreaded case
Figure 5.9 shows that running the algorithm with num_cpu=1 takes about 100 seconds for a version of the full matrix. For this dataset, the sweet spot seems to be at num_cpu=4, which improves performance by about 40%. Beyond this, the overhead of creating and aggregating per-thread output matrices overtakes the benefits of handling increasingly small input shards in each thread.
Summarizing the variants
This section illustrated a realistic journey of optimizing performance of a numerically intensive algorithm as follows:
First, we started with a simple version of the algorithm.
Download
Effective Data Science Infrastructure by Ville Tuulos.mobi
Effective Data Science Infrastructure by Ville Tuulos.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Access | Data Mining |
Data Modeling & Design | Data Processing |
Data Warehousing | MySQL |
Oracle | Other Databases |
Relational Databases | SQL |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8525)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(7380)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(7376)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(7265)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(7029)
Driving Data Quality with Data Contracts by Andrew Jones(7018)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6749)
Learning SQL by Alan Beaulieu(6162)
Weapons of Math Destruction by Cathy O'Neil(6087)
Big Data Analysis with Python by Ivan Marin(5739)
Data Engineering with dbt by Roberto Zagni(4735)
Solidity Programming Essentials by Ritesh Modi(4399)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(4231)
Pandas Cookbook by Theodore Petrou(3945)
Blockchain Basics by Daniel Drescher(3432)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2967)
Natural Language Processing with Java Cookbook by Richard M. Reese(2964)
Feature Store for Machine Learning by Jayanth Kumar M J(2865)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2852)
