Hackers & Painters by Paul Graham
Author:Paul Graham
Language: eng
Format: mobi, epub, pdf
Publisher: O'Reilly Media
Published: 2008-07-13T16:00:00+00:00
where w is the token whose probability we're calculating, good and bad are the hash tables I created in the first step, and G and B are the number of non spam and spam messages respectively.
I want to bias the probabilities slightly to avoid false positives, and by trial and error I've found that a good way to do it is to double all the numbers in good. This helps to distinguish between words that occasionally do occur in legitimate email and words that almost never do. I only consider words that occur more than five times in total (actually, because of the doubling, occurring three times in non spam mail would be enough). And then there is the question of what probability to assign to words that occur in one corpus but not the other. Again by trial and error I chose .01 and .99. There may be room for tuning here, but as the corpus grows such tuning will happen automatically anyway.
The especially observant will notice that while I consider each corpus to be a single long stream of text for purposes of counting occurrences, I use the number of emails in each, rather than their combined length, as the divisor in calculating spam probabilities. This adds another slight bias to protect against false positives.
When new mail arrives, it is scanned into tokens, and the most interesting fifteen tokens, where interesting is measured by how far their spam probability is from a neutral .5, are used to calculate the probability that the mail is spam. If w1, . . . , w15 are the fifteen most interesting tokens, you calculate the combined probability thus:
Download
Hackers & Painters by Paul Graham.epub
Hackers & Painters by Paul Graham.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7810)
Grails in Action by Glen Smith Peter Ledbrook(7719)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6850)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6848)
Running Windows Containers on AWS by Marcio Morales(6375)
Kotlin in Action by Dmitry Jemerov(5092)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(5075)
Combating Crime on the Dark Web by Nearchos Nearchou(4650)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4625)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4438)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4339)
The Age of Surveillance Capitalism by Shoshana Zuboff(3981)
Python for Security and Networking - Third Edition by José Manuel Ortega(3900)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3559)
Learn Wireshark by Lisa Bock(3539)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3528)
Mastering Python for Networking and Security by José Manuel Ortega(3376)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3356)
Blockchain Basics by Daniel Drescher(3327)
