Practical Data Science by Mario Rojas

Practical Data Science by Mario Rojas

Author:Mario Rojas [Mario Rojas]
Language: eng
Format: epub, pdf
Publisher: UNKNOWN
Published: 2021-01-14T00:00:00+00:00


Kingdom animalia Class Mammalia order Carnivora Family Canidae

Genus Canis

specie Lupus familiaris

He is classified as Canis Lupus familiaris.

Youmust look at an object as revealed by recently discovered knowledge. Now, I will guide you through a Python code session, to convert the flat file into a graph with knowledge. The only information you have are in fileAnimals.csvin directory.. VKHCG-Hillman-RawData.

The format is

ItemLevel ParentID ItemID ItemName

0 0 50

0 0 202422

1 50 956096

1 50 956097

The field has the following meanings: Bacteria

plantae

Negibacteria posibacteria

• ItemLevelis how far the specific item is from the top node in the classification.

• ParentIDis the ItemID for the parent of the Item listed.

• ItemID is the unique identifier for the item.

• ItemNameis the full name of the item.

Thedatafitstogether as aconsiderable treeofclassifications. Youmustcreatea graph that gives you the following:

Bacteria-> Negibacteria and Bacteria-> Posibacteria

Following is the code to transform it. You will perform a few sections of data preparation,data storagefortheretrieve,Assesssupersteps,andthen wewillcomplete the Process step into the data vault. You start with the standard framework, so please transfer the code to your Python editor. First,let’sset up the data:

################################################################ # -*- coding: utf-8 -*

################################################################ import sys

import os

import pandas as pd

import networkx as nx

import sqlite3 as sq

import numpy as np

################################################################ if sys.platform == 'linux':

Base=os.path.expanduser('~') + '/VKHCG'

else:

Base='C:/VKHCG'

print('################################')

print('Working Base :',Base, ' using ', sys.platform)

print('################################')

################################################################ ReaderCode='SuperDataScientist'

Please replace the'Practical Data Scientist'in the next line with your name. ReaderName='Practical Data Scientist'

You now set up the locations of all the deliverables of the code.

################################################################ Company='03-Hillman'

InputRawFileName='Animals.csv'

EDSRetrieveDir='01-Retrieve/01-EDS'

InputRetrieveDir=EDSRetrieveDir + '/02-Python'

InputRetrieveFileName='Retrieve_All_Animals.csv'

EDSAssessDir='02-Assess/01-EDS'

InputAssessDir=EDSAssessDir + '/02-Python'

InputAssessFileName='Assess_All_Animals.csv'

InputAssessGraphName='Assess_All_Animals.gml'

You now create the locations of all the deliverables of the code.

################################################################ sFileRetrieveDir=Base + '/' + Company + '/' + InputRetrieveDir if not os.path.exists(sFileRetrieveDir):

os.makedirs(sFileRetrieveDir)

############################################### ################# sFileAssessDir=Base + '/' + Company + '/' + InputAssessDir if not os.path.exists(sFileAssessDir):

os.makedirs(sFileAssessDir)

################################################################ sDataBaseDir=Base + '/' + Company + '/03-Process/SQLite' if not os.path.exists(sDataBaseDir):

os.makedirs(sDataBaseDir)

################################################################ sDatabaseName=sDataBaseDir + '/Hillman.db'

conn = sq.connect(sDatabaseName)

################################################################ # Raw to Retrieve

################################################################

You upload the CSV file with the flat structure.

sFileName=Base + '/' + Company + '/00-RawData/' + InputRawFileName print('###########')

print('Loading :',sFileName)

AnimalRaw=pd.read_csv(sFileName,header=0,low_memory=False, encoding = "ISO-8859-1")

AnimalRetrieve=AnimalRaw.copy()

print(AnimalRetrieve.shape)

################################################################

You store the Retrieve steps data now.

sFileName=sFileRetrieveDir + '/' + InputRetrieveFileName print('###########')

print('Storing Retrieve :',sFileName)

AnimalRetrieve.to_csv(sFileName, index = False)

You store the Assess steps data now.

################################################################ # Retrieve to Assess

################################################################ AnimalGood1 = AnimalRetrieve.fillna('0', inplace=False) AnimalGood2=AnimalGood1[AnimalGood1.ItemName!=0]

AnimalGood2[['ItemID','ParentID']]=AnimalGood2[['ItemID','ParentID']]. astype(np.int32)

AnimalAssess=AnimalGood2

print(AnimalAssess.shape)

################################################################ sFileName=sFileAssessDir + '/' + InputAssessFileName

print('###########')

print('Storing Assess :',sFileName)

AnimalAssess.to_csv(sFileName, index = False)

################################################################ print('################')

sTable='All_Animals'

print('Storing :',sDatabaseName,' Table:',sTable)

AnimalAssess.to_sql(sTable, conn, if_exists="replace")

print('################')

Youstart with the Process steps, to process the flat data into a graph. Youcan now extract the nodes, as follows:

################################################################

print('################')

sTable='All_Animals'

print('Loading Nodes :',sDatabaseName,' Table:',sTable)

sSQL=" SELECT DISTINCT"

sSQL=sSQL+ " CAST(ItemName AS VARCHAR(200)) AS NodeName,"

sSQL=sSQL+ " CAST(ItemLevel AS INT) AS NodeLevel"

sSQL=sSQL+ " FROM"

sSQL=sSQL+ " " + sTable + ";"

AnimalNodeData=pd.read_sql_query(sSQL, conn)

print(AnimalNodeData.shape)

Youhave now successfully extracted the nodes. Well done. Youcan now extract the edges. You will start with the Process step, to convert the data into an appropriate graph structure.

################################################################ print('################')

sTable='All_Animals'

print('Loading Edges :',sDatabaseName,' Table:',sTable) sSQL=" SELECT DISTINCT"

sSQL=sSQL+ " CAST(A1.ItemName AS VARCHAR(200)) AS Node1," sSQL=sSQL+ " CAST(A2.ItemName AS VARCHAR(200)) AS Node2" sSQL=sSQL+ " FROM"

sSQL=sSQL+ " " + sTable + " AS A1"

sSQL=sSQL+ " JOIN"

sSQL=sSQL+ " " + sTable + " AS A2"

sSQL=sSQL+ " ON"

sSQL=sSQL+ " A1.ItemID=A2.ParentID;"

AnimalEdgeData=pd.read_sql_query(sSQL, conn)

print(AnimalEdgeData.shape)

You have now extracted the edges. So, let’s build a graph.

################################################################ G=nx.Graph()

t=0

G.add_node('world', NodeName='World')

################################################################

You add the nodes first.

GraphData=AnimalNodeData

print(GraphData)

################################################################ m=GraphData.shape[0]

for i in range(m):

t+=1

sNode0Name=str(GraphData['NodeName'][i]).strip() print('Node :',t,' of ',m,sNode0Name)

sNode0=sNode0Name.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.