Principles of Data Mining by Max Bramer
Author:Max Bramer
Language: eng
Format: epub, pdf
Publisher: Springer London, London
13.4 Evaluating the Effectiveness of a Distributed System: PMCRI
A distributed data mining system such as PMCRI can be evaluated in terms of three kinds of performance: its scale-up, its size-up and its speed-up. We will consider each of these in turn.
In what follows we will assume that all the processors in the distributed system are identical. We will use the term runtime to refer to the elapsed time taken by the entire system to complete a specified data mining task, excluding the time taken to load the data (Layer 1), which is a fixed overhead on any system of this kind.
We will use the term the workload of a processor to mean the number of instances held in its associated memory. Note however that a value of, say, 10,000 may mean 10,000 instances with all their attributes, or 20,000 instances with half of the attributes each, or 100,000 instances with one tenth of the attributes each, etc. We will assume that the workload is the same for each processor that is in use in the network.
Finally we will use the term total workload of the system to mean the sum of the workloads for each of the processors in use in the network, again measured as a number of instances.
Scale-Up
Scale-up experiments evaluate the performance of the system with respect to the number of processors for a fixed workload per processor. We keep the workload per processor constant and measure the runtime as additional processors are added. Ideally the runtime measured this way would remain constant, as for example, doubling the number of processors would double the amount of data to be processed by the system as a whole but there would be twice the number of processors to do it. A constant runtime would be indicated by a horizontal line on a graph of runtime against the number of processors.
Figure 13.5 is one of several showing results obtained for PMCRI. The runtime is plotted against the number of processors, increasing from 2 to 10, for three values of the workload per processor: 130K, 300K and 850K instances. We can see that rather than remaining horizontal, each plot increases as the number of processors increases. This is caused by an additional communications overhead in the network as more processors need to communicate information via the blackboard. Unsurprisingly, the runtime even for just two processors is greater when the workload per processor is larger. It is easier to see what is happening if we plot on the vertical axis not runtime but relative runtime, i.e. (for each of the three plots) the runtime divided by the runtime for just 2 processors. This gives us Figure 13.6. Now each plot starts with a relative runtime of one (for two processors) and we have added the ‘ideal’ situation of a horizontal line of height one to the graph accordingly.
Figure 13.5Scale-up of PMCRI
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7878)
Hadoop in Practice by Alex Holmes(5671)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5527)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(4534)
Functional Programming in JavaScript by Mantyla Dan(3733)
The Age of Surveillance Capitalism by Shoshana Zuboff(3442)
Big Data Analysis with Python by Ivan Marin(3161)
Blockchain Basics by Daniel Drescher(2906)
The Rosie Effect by Graeme Simsion(2729)
Test-Driven Development with Java by Alan Mellor(2693)
WordPress Plugin Development Cookbook by Yannick Lefebvre(2645)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2565)
Data Augmentation with Python by Duc Haba(2546)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2500)
Dawn of the New Everything by Jaron Lanier(2449)
Principles of Data Fabric by Sonia Mezzetta(2353)
The Infinite Retina by Robert Scoble Irena Cronin(2343)
The Art Of Deception by Kevin Mitnick(2312)
Rapid Viz: A New Method for the Rapid Visualization of Ideas by Kurt Hanks & Larry Belliston(2213)