A collection of Data Science Interview Questions Solved in Python and Spark: BigData and Machine Learning in Python and Spark (A Collection of Programming Interview Questions Book 6) by Antonio Gulli

A collection of Data Science Interview Questions Solved in Python and Spark: BigData and Machine Learning in Python and Spark (A Collection of Programming Interview Questions Book 6) by Antonio Gulli

Author:Antonio Gulli [Gulli, Antonio]
Language: eng
Format: epub
Published: 2015-10-16T07:00:00+00:00


# Load documents (one per line).

documents = sc.textFile("...").map(lambda line: line.split(" "))

#hash the terms and compute the TF on documents

hashingTF = HashingTF()

tf = hashingTF.transform(documents)

# force the real computation

tf.cache()

# compute the global IDF: ignore too rare terms < 2 documents

idf = IDF(minDocFreq=2).fit(tf)

# do the TFxIDF in a distributed fashion

tfidf = idf.transform(tf)

30. What is “features hashing”? And why is it useful for BigData?

Solution



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.