Python Programming: The Ultimate Intermediate Guide to Learn Python Step by Step by Ryan Turner
Author:Ryan Turner [Turner, Ryan]
Language: eng
Format: azw3
Published: 2018-10-27T16:00:00+00:00
patient_data = pd.read_csv(“D:/Datasets/patients.csv”)
The script above is going to help you to load up the data set for patients.csv in the dataset fold that you have it set. If you are using the Jupyter notebook, this is even easier to do. You would just use the following script to help you see what the data looks like:
patient_data.head()
But, if you are working with the Spyder program, you would go over to your Variable explorer and then double click on a patient_data variable from the list of variables that show up. Once you click on the right variable, you will be able to see all the details of this dataset.
At this point, you should be able to see the pandas data frame that looks similar to a matrix with zero-based index. Once you have the dataset loaded, the next step is to divide the dataset into a matrix of features and vector of dependent variables. The feature set will consist of all your independent variables. For instance, the feature matrix for the patients.csv dataset is going to contain the information about the Gender, BMI, and Age of the patient. In addition, the size of your feature matrix is equal to the number of independent variables by the number of records. In this case, our matrix is going to be 3 by 12 because we have twelve records and three independent variables.
Let’s first go through and create our feature features. You can give it any name that you would like, but traditionally it is going to be denoted by the capital X. To help us read the code a bit better, we are going to name it “features” and then use the following script:
features = patient_data.iloc [:,0:3].values
With the script that we used above, the iloc function of your data frame is used to help select all the rows as well as the first three columns from the patient_data of the data frame. This iloc function is going to take on two parameters. The first is the range of rows to select, and then the second part is going to be the range of columns you want the program to select.
If you would like to create your own label vector from here, you would use the following script to get it done:
labels = patient_data.iloc[:3].values
How to Handle Any Missing Values
If you take a look at your patient_data object, you are going to notice that the record at index 4 is missing out on a value in the BMI column. To help handle these missing values, the easiest approach will be to remove the record that is missing a value. However, this record could contain some crucial information so you won’t want to remove it at this time.
Another approach that you can use to help deal with this missing value is to put something in there to replace that missing value. Often the best choice here is to replace the missing value with the median or the mean of all the other values in that same column.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Deep Learning with Python by François Chollet(12585)
Hello! Python by Anthony Briggs(9920)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9799)
The Mikado Method by Ola Ellnestam Daniel Brolund(9782)
Dependency Injection in .NET by Mark Seemann(9343)
A Developer's Guide to Building Resilient Cloud Applications with Azure by Hamida Rebai Trabelsi(9299)
Hit Refresh by Satya Nadella(8826)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8305)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7786)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7768)
Grails in Action by Glen Smith Peter Ledbrook(7700)
The Kubernetes Operator Framework Book by Michael Dame(7666)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7562)
Exploring Deepfakes by Bryan Lyon and Matt Tora(7455)
Practical Computer Architecture with Python and ARM by Alan Clements(7378)
Implementing Enterprise Observability for Success by Manisha Agrawal and Karun Krishnannair(7361)
Robo-Advisor with Python by Aki Ranin(7335)
Building Low Latency Applications with C++ by Sourav Ghosh(7242)
Svelte with Test-Driven Development by Daniel Irvine(7207)
