Machine Learning in a Nutshell (Executive Leadership Series) by Lifehacker Books
Author:Lifehacker Books
Language: eng
Format: azw3
Publisher: UNKNOWN
Published: 2017-07-31T07:00:00+00:00
Jocelyn decided to begin her data exploration work by focusing on the target feature. The structure of the data available from the Galaxy Zoo project is shown below in the table. The category of each galaxy is voted on by multiple Galaxy Zoo participants, and the data includes the fraction of these votes for each of the categories.
The raw data did not contain a single column that could be used as a target feature, so Jocelyn had to design one from the data sources that were present. She generated two possible target features from the data provided. In both cases, the target feature level was set to the galaxy category that received the majority of the votes. In the first target feature, just three levels were used:
elliptical (P EL majority), spiral (P CW, P ACW, or P EDGE majority), and other (P MG or P DK majority). The second target feature allowed three levels for spiral galaxies: spiral cw (P CW majority), spiral acw (P ACW majority), and spiral edge (P EDGE majority).
The main observation that Jocelyn made from these is that galaxies in the dataset were not evenly distributed across the different morphology types. Instead, the elliptical level was much more heavily represented than the others in both cases. Using the 3-level target feature as her initial focus, Jocelyn began to look at the different descriptive features in the data downloaded from the SDSS repository that might be useful in building a model to predict galaxy morphology.
The SDSS download that Jocelyn had access to was a big dataset—over 600,000 rows. Although modern predictive analytics and machine learning tools can handle data of this size, a large dataset can be cumbersome when performing data exploration operations—calculating summary statistics, generating visualizations, and performing correlation tests can just take too long.
For this reason, Jocelyn extracted a small sample of 10,000 rows from the full dataset for exploratory analysis using stratified sampling.
Given that (1) the SDSS data that Jocelyn downloaded was already in a single table; (2) the data was already at the right prediction subject level (one row per galaxy); and (3) many of the columns in this dataset would most likely be used directly as features in the ABT that she was building, Jocelyn decided to produce a data quality report on this dataset. The following table shows an extract from this data quality report.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
The Mikado Method by Ola Ellnestam Daniel Brolund(26316)
Hello! Python by Anthony Briggs(25239)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(24473)
Kotlin in Action by Dmitry Jemerov(23559)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(22902)
Dependency Injection in .NET by Mark Seemann(22689)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(21452)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(20290)
Grails in Action by Glen Smith Peter Ledbrook(19358)
Adobe Camera Raw For Digital Photographers Only by Rob Sheppard(17056)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(16378)
Secrets of the JavaScript Ninja by John Resig & Bear Bibeault(14086)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(12264)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(11541)
A Developer's Guide to Building Resilient Cloud Applications with Azure by Hamida Rebai Trabelsi(10645)
Hit Refresh by Satya Nadella(9223)
The Kubernetes Operator Framework Book by Michael Dame(8579)
Exploring Deepfakes by Bryan Lyon and Matt Tora(8432)
Robo-Advisor with Python by Aki Ranin(8376)