Home > Computers & Technology > Computer Science > Information Theory

Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang

Author:Andre Ye & Zian Wang , Date: July 14, 2023 ,Views: 90

Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang

Author:Andre Ye & Zian Wang
Language: eng
Format: mobi, epub
ISBN: 9781484286920
Publisher: Apress
Published: 2022-12-30T01:48:07.819000+00:00

In practice, DeepInsight should be a contributing member in an ensemble of other decision-making models. Combining the locality-specific nature of DeepInsight with the more global approach of other modeling methods will likely yield a more informed predictive ensemble.

Sharma et al. have provided prepackaged code to use DeepInsight in Python, which can be installed from the GitHub repository (Listing 4-60).

!python3 -m pip -q install git+https://github.com/alok-ai-lab/pyDeepInsight.git#egg=pyDeepInsight

Listing 4-60 Installing code provided by Sharma et al. for DeepInsight. At the time of this bookâs writing, the authors of pyDeepInsight are making active changes that make this installation command erroneous. If you encounter errors, check the GitHub repository for the most up-to-date information on installation

The dataset we will use is the Mice Protein Expression dataset from the infamous University of California Irvine Machine Learning Repository, which is a classification dataset with 1080 instances and 80 features modeling the expression of 77 proteins in the cerebral cortex of mice exposed to contextual fear conditioning. A cleaned version of the dataset is available in the source code for this book to be downloaded.

Assuming that the data has been loaded as a Pandas DataFrame in the variable data, the first step is to separate into training and testing datasets, a standard procedure in machine learning (Listing 4-61). Weâll also need to convert the labels to one-hot format, which in their original organization are integers corresponding to a class. This can be accomplished easily using keras.utilâs to_categorical function.

import pandas as pd

# download csv from online source files

data = pd.read_csv('mouse-protein-expression.csv')

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.drop('class',axis=1),

data['class'],

train_size=0.8)

y_train = keras.utils.to_categorical(y_train)

y_test = keras.utils.to_categorical(y_test)

Listing 4-61 Selecting a subset of data and converting to one-hot form as necessary

We will need to use the LogScaler object from the DeepInsight library to scale the data between 0 and 1 using the L2 norm (Listing 4-62). We transform both the training dataset and the testing dataset, fitting the scaler on the training dataset only. All new data used for prediction by the DeepInsight model should pass through this scaler first.

from pyDeepInsight import LogScaler

ln = LogScaler()

X_train_norm = ln.fit_transform(X_train)

X_test_norm = ln.transform(X_test)

Listing 4-62 Scaling data

The ImageTransformer object performs the image transformation by first generating the âtemplateâ matrix via a dimensionality reduction method passed into feature_extractor, which accepts either 'tsne', 'pca', or 'kpca'. This method is used to determine a mapping of features in the input vector to an image of pixels dimensions. We can instantiate an ImageTransformer with the kernel-PCA dimensionality reduction method to generate 32-by-32 images (feature_extractor='kpca', pixels=32) (Listing 4-63).

from pyDeepInsight import ImageTransformer

it = ImageTransformer(feature_extractor='kpca',

pixels=32)

tf_train_x = it.fit_transform(X_train_norm)

tf_test_x = it.transform(X_test_norm)

Listing 4-63 Training and transforming with the ImageTransformer

Kernel-PCA is used rather than t-SNE because of the relatively low dimensionality and quantity of the data. PCA is not employed because its linearity limits the nuance it captures. An image length of 32 pixels is chosen as a balance between making the generated images too sparse (too high an image length) and too small (too small an image length) to meaningfully and accurately represent spatial relationships between features. As image size decreases,

Download

Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang.mobi
Modern Deep Learning for Tabular Data by Andre Ye & Zian Wang.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

AI & Machine Learning	Bioinformatics
Computer Simulation	Cybernetics
Human-Computer Interaction	Information Theory
Robotics	Systems Analysis & Design