Home > Computers & Technology > Business Technology

Black Hat Python by Justin Seitz & Tim Arnold

Author:Justin Seitz & Tim Arnold [Seitz, Justin & Arnold, Tim] , Date: December 25, 2021 ,Views: 403

Black Hat Python by Justin Seitz & Tim Arnold

Author:Justin Seitz & Tim Arnold [Seitz, Justin & Arnold, Tim]
Language: eng
Format: epub, mobi
ISBN: 9781718501133
Published: 2021-02-24T00:00:00+00:00

HTMLParser 101

In the example in this section, we used the requests and lxml packages to make HTTP requests and parse the resulting content. But what if you are unable to install the packages and therefore must rely on the standard library? As we noted in the beginning of this chapter, you can use urllib for making your requests, but youâll need to set up your own parser with the standard library html.parser.HTMLParser.

There are three primary methods you can implement when using the HTMLParser class: handle_starttag, handle_endtag, and handle_data. The handle_starttag function will be called anytime an opening HTML tag is encountered, and the opposite is true for the handle_endtag function, which gets called each time a closing HTML tag is encountered. The handle_data function gets called when there is raw text between tags. The function prototypes for each function are slightly different, as follows:

handle_starttag(self, tag, attributes) handle_endttag(self, tag) handle_data(self, data)

Hereâs a quick example to highlight this:

<title>Python rocks!</title> handle_starttag => tag variable would be "title" handle_data => data variable would be "Python rocks!" handle_endtag => tag variable would be "title"

With this very basic understanding of the HTMLParser class, you can do things like parse forms, find links for spidering, extract all of the pure text for data-mining purposes, or find all of the images in a page.

Download

Black Hat Python by Justin Seitz & Tim Arnold.epub
Black Hat Python by Justin Seitz & Tim Arnold.mobi

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design

Popular ebooks

Dependency Injection in .NET by Mark Seemann(21837)
Exploring Deepfakes by Bryan Lyon and Matt Tora(8400)
Robo-Advisor with Python by Aki Ranin(8357)
Offensive Shellcode from Scratch by Rishalin Pillay(6464)
Microsoft 365 and SharePoint Online Cookbook by Gaurav Mahajan Sudeep Ghatak Nate Chamberlain Scott Brewster(5707)
Ego Is the Enemy by Ryan Holiday(5527)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4604)
Python for ArcGIS Pro by Silas Toms Bill Parker(4526)
Machine Learning at Scale with H2O by Gregory Keys | David Whiting(4361)
Elevating React Web Development with Gatsby by Samuel Larsen-Disney(4239)
Liar's Poker by Michael Lewis(3491)
Learning C# by Developing Games with Unity 2021 by Harrison Ferrone(3374)
Speed Up Your Python with Rust by Maxwell Flitton(3340)
OPNsense Beginner to Professional by Julio Cesar Bueno de Camargo(3309)
Extreme DAX by Michiel Rozema & Henk Vlootman(3283)
Agile Security Operations by Hinne Hettema(3221)
Linux Command Line and Shell Scripting Techniques by Vedran Dakic and Jasmin Redzepagic(3197)
Essential Cryptography for JavaScript Developers by Alessandro Segala(3180)
Cryptography Algorithms by Massimo Bertaccini(3117)
AI-Powered Commerce by Andy Pandharikar & Frederik Bussler(3074)