Home > Computers & Technology > Databases & Big Data

Data Processing with Optimus by Dr. Argenis Leon & Luis Aguirre

Author:Dr. Argenis Leon & Luis Aguirre [Dr. Argenis Leon] , Date: September 7, 2021 ,Views: 1311

Data Processing with Optimus by Dr. Argenis Leon & Luis Aguirre

Author:Dr. Argenis Leon & Luis Aguirre [Dr. Argenis Leon]
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2021-09-02T16:00:00+00:00

For a more general insight into the data, you can ask for a complete profile of the dataset. Let's check that out.

Data profiling

There is a handy function in Optimus called profile that returns useful stats about our dataset. Let's see how to use it:

df.profile(bins=5)

This code will return a dictionary:

{'columns': {'id': {'stats': {'match': 504,

'missing': 0,

'mismatch': 0,

'profiler_dtype': {'dtype': 'int', 'categorical': True},

'frequency': [{'value': 1, 'count': 1},

{'value': 332, 'count': 1},

{'value': 345, 'count': 1},

{'value': 344, 'count': 1},

{'value': 343, 'count': 1}],

'count_uniques': 504},

'dtype': 'int64'},

'name': {'stats': {'match': 504,

'missing': 0,

'mismatch': 0,

'profiler_dtype': {'dtype': 'str', 'categorical': True},

'frequency': [{'value': 'pants', 'count': 254},

{'value': 'shoes', 'count': 134},

{'value': 'shirt', 'count': 116}],

'count_uniques': 3},

'dtype': 'object'},

'code': {'stats': {'match': 504,

'missing': 0,

'mismatch': 0,

'profiler_dtype': {'dtype': 'str', 'categorical': True},

'frequency': [{'value': 'JG15', 'count': 60},

{'value': 'JG10', 'count': 43},

{'value': 'SK', 'count': 37},

{'value': 'L15', 'count': 33},

{'value': 'J15', 'count': 32}],

'count_uniques': 39},

'dtype': 'object'},

'price': {'stats': {'match': 504,

'missing': 0,

'mismatch': 0,

'profiler_dtype': {'dtype': 'decimal', 'categorical':

False},

'hist': [{'lower': 5.0, 'upper': 103.3675, 'count': 250},

{'lower': 103.3675, 'upper': 201.735, 'count': 179},

{'lower': 201.735, 'upper': 300.1025, 'count': 39},

{'lower': 300.1025, 'upper': 398.47, 'count': 36}]},

'dtype': 'float64'},

'discount': {'stats': {'match': 294,

'missing': 0,

'mismatch': 210,

'profiler_dtype': {'dtype': 'int', 'categorical': True},

'frequency': [{'value': '0', 'count': 294},

{'value': '5%', 'count': 65},

{'value': '20%', 'count': 63},

{'value': '15%', 'count': 54},

{'value': '50%', 'count': 16}],

'count_uniques': 6},

'dtype': 'object'}},

'name': 'store.csv',

'file_name': ['store.csv'],

'summary': {'cols_count': 5,

'rows_count': 504,

'dtypes_list': ['float64', 'int64', 'object'],

'total_count_dtypes': 3,

'missing_count': 0,

'p_missing': 0.0}

}

With this Python dictionary, you can get info about specific columns and stats about the whole dataframe.

For dataframe stats, you can use profile.summary() to get the following:

cols_count: Number columns in the dataframe

rows_count: Number of rows in the dataframe

dtypes_list: List of dtypes in the dataframe

total_count_dtypes: Count of data types in the dataframe

missing_count: Number of missing values in the dataframe

p_missing: Percentage of missing values in the dataframe

Download

Data Processing with Optimus by Dr. Argenis Leon & Luis Aguirre.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design