Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits by Tarek Amr

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits by Tarek Amr

Author:Tarek Amr [Tarek Amr]
Language: eng
Format: epub
Tags: COM037000 - COMPUTERS / Machine Theory, COM051360 - COMPUTERS / Programming Languages / Python, COM062000 - COMPUTERS / Data Modeling and Design
Publisher: Packt Publishing
Published: 2020-07-24T04:35:40+00:00


pip install spacy

python -m spacy download en_core_web_lg

Then, we can assign the downloaded vectors to our five words as follows:

import spacy

nlp = spacy.load('en_core_web_lg')

terms = ['I', 'like', 'apples', 'oranges', 'pears']

vectors = [

nlp(term).vector.tolist() for term in terms

]

Here is the representation for apples:

# pd.Series(vectors[terms.index('apples')]).rename('apples')

0 -0.633400 1 0.189810 2 -0.535440 3 -0.526580 ... 296 -0.238810 297 -1.178400 298 0.255040 299 0.611710 Name: apples, Length: 300, dtype: float64

I promised you that the representations for apples, oranges, and pears would not be orthogonal as in the case with CountVectorizer. However, with 300 dimensions, it is hard for me to visually prove that. Luckily, we have already learned how to calculate the cosine of the angle between two vectors. Orthogonal vectors should have 90o angles between them, whose cosines are equal to 0. The cosine for the zero angle between two vectors going in the exact same direction is 1.

Here, we calculate the cosine between all the five vectors we got from spaCy. I used some pandas and seaborn styling to make the numbers clearer:

import seaborn as sns

from sklearn.metrics.pairwise import cosine_similarity



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.