Taming Big Data Analytics by A Tanveer

Taming Big Data Analytics by A Tanveer

Author:A, Tanveer [A, Tanveer]
Language: eng
Format: epub
Published: 2020-12-22T16:00:00+00:00


Output :

#columns identified as features are as below:

#['Cruise_line','Age','Tonnage','passengers','length','cabins','passenger_density']

#to work on the features, spark MLlib expects every value to be in numeric form

#feature 'Cruise_line is string datatype

#using StringIndexer, string type will be typecast to numeric datatype

#import library strinindexer for typecasting

from pyspark.ml.feature import StringIndexer

indexer=StringIndexer(inputCol='Cruise_line',outputCol='cruise_cat')

indexed=indexer.fit(df).transform(df)

#above code will convert string to numeric feature and create a new dataframe

#new dataframe contains a new feature 'cruise_cat' and can be used further

#feature cruise_cat is now vectorized and can be used to fed to model

for item in indexed.head(5):

print(item)

print('
')



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.