Data Analytics with Python by Frank Millstein

Data Analytics with Python by Frank Millstein

Author:Frank Millstein
Language: eng
Format: epub
Tags: Data analytics applications, Data analytics process, Data Analytics With Python, Data Analytics, Python, Python data structures, Python libraries, Data exploration using Pandas, Pandas series, Data munging, Data manipulation
Publisher: Frank Millstein
Published: 2019-10-18T00:00:00+00:00


CARRYING OUT BINARY OPERATIONS

Dataframe has the methods sub, add, mul and div and all related functions including rsub, radd and others for carrying different binary operations. For broadcasting behavior, the series input is of primary interest. Using these listed functions, you can easily match on columns or on the index using the axis keyword as illustrated below.

df = pd . DataFrame ( {' one ' : pd . Series ( np . random . randn ( 3 ), index = [' a ', ' b ', ' c ' ] ),

....: ' two ' : pd.Series ( np . random . randn (4), index = [ ' a ', ' b ', ' c ', ' d ' ] ),

....: ' three ' : pd . Series ( np . random . randn ( 3 ), index = [ ' b ', ' c ', ' d ' ] ) } )

....:

df

Output:

one three two

a - 1.101558 NaN 1.124472

b - 0.177289 - 0.634293 2.487104

c 0.462215 1.931194 - 0.486066

d NaN - 1.222918 - 0.456288

row = df . iloc [ 1 ]

column = df [ ' two ' ]

df . sub ( row , axis = ' columns ' )

Output:

one three two

a - 0.924269 NaN - 1.362632

b 0.000000 0.000000 0.000000

c 0.639504 2.565487 - 2.973170

d NaN - 0.588625 - 2.943392

df . sub ( row, axis = 1 )

Output :

one three two

a - 0.924269 NaN - 1.362632

b 0.000000 0.000000 0.000000

c 0.639504 2.565487 - 2.973170

d NaN - 0.588625 - 2.943392

df . sub ( column, axis = ' index ' )

Output :

one three two

a - 2.226031 NaN 0.0

b - 2.664393 - 3.121397 0.0

c 0.948280 2.417260 0.0

d NaN - 0.766631 0.0

df . sub ( column , axis = 0 )

Output :

one three two

a - 2.226031 NaN 0.0

b - 2.664393 - 3.121397 0.0

c 0.948280 2.417260 0.0

d NaN - 0.766631 0.0

You also can align a level of your multi-indexed dataframe with series as follows.

dfmi = df . copy ( )

dfmi . index = pd . MultiIndex . from _ tuples( [ ( 1,' a ' ),( 1,' b ' ),( 1,' c ' ),( 2,' a ' ) ],

....: names = [ ' first ' , ' second ' ] )

....:

dfmi . sub ( column , axis = 0, level = ' second ' )

Output :

one three two

first second

1 a - 2.226031 NaN 0.00000

b - 2.664393 - 3.121397 0.00000

c 0.948280 2.417260 0.00000

2 a NaN - 2.347391 -1.58076

In dataframe and series, the arithmetic functions have the option of inputting a fill value that has a value to substitute when some of the values at a location are missing. For instance, when you add two dataframe objects, you may want to treat NaN as zero unless both of your dataframes are missing that value. In this case, the result you get is NaN and later you can replace NaN with another value using this fill value argument if your want.

df

Output :

one three two

a - 1.101558 NaN 1.124472

b - 0.177289 - 0.634293 2.487104

c 0.462215 1.931194 - 0.486066

d NaN - 1.222918 - 0.456288

df2

Output :

one three two

a - 1.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.