Machine Learning: A Comprehensive Beginner’s Guide by Akshay B R & Raj Pulari Sini & Murugesh T S & Vasudevan Shriram K

Machine Learning: A Comprehensive Beginner’s Guide by Akshay B R & Raj Pulari Sini & Murugesh T S & Vasudevan Shriram K

Author:Akshay, B R & Raj Pulari, Sini & Murugesh, T S & Vasudevan, Shriram K
Language: eng
Format: epub
Publisher: Taylor & Francis Group
Published: 2024-05-09T13:07:16+00:00


7.3 Data pre-processing

The lines of code as seen in Figure 7.1 provide import necessary libraries and modules that will be used for the tasks of data manipulation, evaluation, and machine learning. Data manipulation is handled by the imported pandas library, while numerical operations are handled by the imported numpy library. In order to divide the data into a training set and a testing set, the train_test_split function, which is located in the sklearn.model_selection import, is used. In order to scale features, the StandardScaler class included in the sklearn.preprocessing package is imported. For the purpose of utilizing a variety of classification strategies, the LogisticRegression, RandomForestClassifier, & SVC classes are imported via the appropriate modules within sklearn. For the purpose of model evaluation, evaluation metrics such as accuracy_score, precision_score, recall_score, f1_score, and roc_auc_score are imported from sklearn.metrics. These lines of code lay the groundwork for the functionality and tools that will be required for the upcoming data analysis and tasks related to machine learning.

Figure 7.1Code snippet to import libraries.

The above line as in Figure 7.2 reads the dataset that is included in a CSV file by utilizing the read_csv function that is located in the pandas library. The file location where the dataset may be found is “/content/diabetes.csv,” which is the name that has been given to the CSV file that is being used. The file is read using the read_csv function, which then returns a DataFrame. This DataFrame is then assigned to a value known as data.

Figure 7.2Code snippet to load the diabetes dataset.

When these lines of code are executed, the dataset titled “Pima Indians Diabetes” is loaded into storage as a pandas DataFrame, and then it is allocated to the variable titled “data” for the purposes of further investigation and manipulation.

The lines of code that are provided in Figure 7.3 involve segmenting the database into features and target factors, as well as segmenting the data even further into testing and training sets. The “Outcome” column, which originally represented the characteristics, is removed from the initial data set in order to make room for the X DataFrame. After choosing the “Outcome” item in the dataset, which stands in for the target variable, the y Series is finally ready to be generated. After that, the train_test_split procedure is called to separate the features (X) and the variable of interest (y) to separate sets for training and testing. The test_size=0.2 option instructs the model to reserve 20% of the data for the testing phase, while devoting the remaining 80% of the data to the learning phase. The exactness of the split may be guaranteed thanks to the random_state=42 argument.

Figure 7.3Code snippet to split the dataset.

Scaling of features is accomplished using the lines of code provided in Figure 7.4, by utilizing the StandardScaler class found in the sklearn.preprocessing package. In order to calculate the average and standard deviation of each feature based on the training data, a StandardScaler object with the name scaler is first constructed. This feature scaling step helps machine learning models that use these characteristics perform better and be easier to interpret.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Eco-friendly approach of bio-indigo synthesis and developing purification methods towards isolation of indigo from indirubin and bacterial fragments by Ramalingam Manivannan & Kaliyan Prabakaran & Young-A Son(203236)
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(171797)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(80230)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(79790)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(79546)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74426)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50881)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40254)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40212)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40087)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32725)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32503)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32450)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32381)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32357)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32329)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32253)
What's Done in Darkness by Kayla Perrin(27138)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26514)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26452)