Big Data For Dummies by Kaufman Marcia Kirsch Dan Hurwitz Judith Halper Fern
Author:Kaufman, Marcia, Kirsch, Dan, Hurwitz, Judith, Halper, Fern
Language: eng
Format: epub
Publisher: Wiley
Published: 2013-03-25T16:00:00+00:00
* * *
What is data mining?
Data mining involves exploring and analyzing large amounts of data to find patterns in that data. The techniques came out of the fields of statistics and artificial intelligence (AI), with a bit of database management thrown into the mix. Generally, the goal of the data mining is either classification or prediction. In classification, the idea is to sort data into groups. For example, a marketer might be interested in the characteristics of those who responded versus who didn’t respond to a promotion. These are two classes. In prediction, the idea is to predict the value of a continuous (that is, nondiscrete) variable. For example, a marketer might be interested in predicting those who will respond to a promotion.
Typical algorithms used in data mining include the following:
Classification trees: A popular data-mining technique that is used to classify a dependent categorical variable based on measurements of one or more predictor variables. The result is a tree with nodes and links between the nodes that can be read to form if-then rules.
Logistic regression: A statistical technique that is a variant of standard regression but extends the concept to deal with classification. It produces a formula that predicts the probability of the occurrence as a function of the independent variables.
Neural networks: A software algorithm that is modeled after the parallel architecture of animal brains. The network consists of input nodes, hidden layers, and output nodes. Each of the units is assigned a weight. Data is given to the input node, and by a system of trial and error, the algorithm adjusts the weights until it meets a certain stopping criteria. Some people have likened this to a black–box (you don’t necessarily know what is going on inside) approach.
Clustering techniques like K-nearest neighbors: A technique that identifies groups of similar records. The K-nearest neighbor technique calculates the distances between the record and points in the historical (training) data. It then assigns this record to the class of its nearest neighbor in a data set.
Here’s a classification tree example. Consider the situation where a telephone company wants to determine which residential customers are likely to disconnect their service. The telephone company has information consisting of the following attributes: how long the person has had the service, how much he spends on the service, whether he has had problems with the service, whether he has the best calling plan for his needs, where he lives, how old he is, whether he has other services bundled together with his calling plan, competitive information concerning other carriers plans, and whether he still has the service or has disconnected the service. Of course, you can find many more attributes than this. The last attribute is the outcome variable; this is what the software will use to classify the customers into one of the two groups — perhaps called stayers and flight risks.
The data set is broken into training data and a test data set. The training data consists of observations (called
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(175732)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(84167)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(83815)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(83595)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74438)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50893)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40260)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40216)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40096)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32731)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32508)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32453)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32387)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32361)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32333)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32257)
What's Done in Darkness by Kayla Perrin(27150)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26523)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26458)