Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang
Author:Andre Ye & Zian Wang
Language: eng
Format: mobi, epub
ISBN: 9781484286920
Publisher: Apress
Published: 2022-12-30T01:48:07.819000+00:00
In practice, DeepInsight should be a contributing member in an ensemble of other decision-making models. Combining the locality-specific nature of DeepInsight with the more global approach of other modeling methods will likely yield a more informed predictive ensemble.
Sharma et al. have provided prepackaged code to use DeepInsight in Python, which can be installed from the GitHub repository (Listing 4-60).
!python3 -m pip -q install git+https://github.com/alok-ai-lab/pyDeepInsight.git#egg=pyDeepInsight
Listing 4-60 Installing code provided by Sharma et al. for DeepInsight. At the time of this bookâs writing, the authors of pyDeepInsight are making active changes that make this installation command erroneous. If you encounter errors, check the GitHub repository for the most up-to-date information on installation
The dataset we will use is the Mice Protein Expression dataset from the infamous University of California Irvine Machine Learning Repository, which is a classification dataset with 1080 instances and 80 features modeling the expression of 77 proteins in the cerebral cortex of mice exposed to contextual fear conditioning. A cleaned version of the dataset is available in the source code for this book to be downloaded.
Assuming that the data has been loaded as a Pandas DataFrame in the variable data, the first step is to separate into training and testing datasets, a standard procedure in machine learning (Listing 4-61). Weâll also need to convert the labels to one-hot format, which in their original organization are integers corresponding to a class. This can be accomplished easily using keras.utilâs to_categorical function.
import pandas as pd
# download csv from online source files
data = pd.read_csv('mouse-protein-expression.csv')
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.drop('class',axis=1),
data['class'],
train_size=0.8)
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)
Listing 4-61 Selecting a subset of data and converting to one-hot form as necessary
We will need to use the LogScaler object from the DeepInsight library to scale the data between 0 and 1 using the L2 norm (Listing 4-62). We transform both the training dataset and the testing dataset, fitting the scaler on the training dataset only. All new data used for prediction by the DeepInsight model should pass through this scaler first.
from pyDeepInsight import LogScaler
ln = LogScaler()
X_train_norm = ln.fit_transform(X_train)
X_test_norm = ln.transform(X_test)
Listing 4-62 Scaling data
The ImageTransformer object performs the image transformation by first generating the âtemplateâ matrix via a dimensionality reduction method passed into feature_extractor, which accepts either 'tsne', 'pca', or 'kpca'. This method is used to determine a mapping of features in the input vector to an image of pixels dimensions. We can instantiate an ImageTransformer with the kernel-PCA dimensionality reduction method to generate 32-by-32 images (feature_extractor='kpca', pixels=32) (Listing 4-63).
from pyDeepInsight import ImageTransformer
it = ImageTransformer(feature_extractor='kpca',
pixels=32)
tf_train_x = it.fit_transform(X_train_norm)
tf_test_x = it.transform(X_test_norm)
Listing 4-63 Training and transforming with the ImageTransformer
Kernel-PCA is used rather than t-SNE because of the relatively low dimensionality and quantity of the data. PCA is not employed because its linearity limits the nuance it captures. An image length of 32 pixels is chosen as a balance between making the generated images too sparse (too high an image length) and too small (too small an image length) to meaningfully and accurately represent spatial relationships between features. As image size decreases,
Download
Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang.epub
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
AI & Machine Learning | Bioinformatics |
Computer Simulation | Cybernetics |
Human-Computer Interaction | Information Theory |
Robotics | Systems Analysis & Design |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8296)
Test-Driven Development with Java by Alan Mellor(6707)
Data Augmentation with Python by Duc Haba(6613)
Principles of Data Fabric by Sonia Mezzetta(6370)
Learn Blender Simulations the Right Way by Stephen Pearson(6267)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6133)
Hadoop in Practice by Alex Holmes(5958)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5806)
RPA Solution Architect's Handbook by Sachin Sahgal(5529)
Big Data Analysis with Python by Ivan Marin(5353)
The Infinite Retina by Robert Scoble Irena Cronin(5226)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5144)
Pretrain Vision and Large Language Models in Python by Emily Webber(4315)
Infrastructure as Code for Beginners by Russ McKendrick(4076)
Functional Programming in JavaScript by Mantyla Dan(4038)
The Age of Surveillance Capitalism by Shoshana Zuboff(3946)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3790)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3592)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3568)
