Home > Computers & Technology > Programming

Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills

Author:Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills [Ryza, Sandy] , Date: April 1, 2018 ,Views: 137

Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills

Author:Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills [Ryza, Sandy]
Language: eng
Format: azw3
Publisher: O'Reilly Media
Published: 2017-06-12T04:00:00+00:00

Finding Important Concepts

So SVD outputs a bunch of numbers. How can we inspect these to verify they actually relate to anything useful? The V matrix represents concepts through the terms that are important to them. As discussed earlier, V contains a column for every concept and a row for every term. The value at each position can be interpreted as the relevance of that term to that concept. This means that the most relevant terms to each of the top concepts can be found with something like this:

import org.apache.spark.mllib.linalg.{Matrix, SingularValueDecomposition} import org.apache.spark.mllib.linalg.distributed.RowMatrix def topTermsInTopConcepts( svd: SingularValueDecomposition[RowMatrix, Matrix], numConcepts: Int, numTerms: Int, termIds: Array[String]) : Seq[Seq[(String, Double)]] = { val v = svd.V val topTerms = new ArrayBuffer[Seq[(String, Double)]]() val arr = v.toArray for (i <- 0 until numConcepts) { val offs = i * v.numRows val termWeights = arr.slice(offs, offs + v.numRows).zipWithIndex val sorted = termWeights.sortBy(-_._1) topTerms += sorted.take(numTerms).map { case (score, id) => (termIds(id), score) } } topTerms }

Download

Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Sandy Ryza & Uri Laserson & Sean Owen & Josh Wills.azw3

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design

Popular ebooks

Adobe Camera Raw For Digital Photographers Only by Rob Sheppard(16791)
Deep Learning with Python by François Chollet(12628)
Hello! Python by Anthony Briggs(9944)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9818)
The Mikado Method by Ola Ellnestam Daniel Brolund(9809)
Dependency Injection in .NET by Mark Seemann(9366)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8330)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7784)
Grails in Action by Glen Smith Peter Ledbrook(7719)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7590)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(7326)
Microservices with Go by Alexander Shuiskov(7091)
Practical Design Patterns for Java Developers by Miroslav Wengner(7011)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6951)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6441)
Angular Projects - Third Edition by Aristeidis Bampakos(6368)
The Art of Crafting User Stories by The Art of Crafting User Stories(5893)
NetSuite for Consultants - Second Edition by Peter Ries(5813)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5625)
Kotlin in Action by Dmitry Jemerov(5089)