Black Hat Python, 2nd Edition by Justin Seitz & Tim Arnold

Black Hat Python, 2nd Edition by Justin Seitz & Tim Arnold

Author:Justin Seitz & Tim Arnold [Justin Seitz]
Language: eng
Format: epub
Publisher: No Starch Press
Published: 2021-04-12T16:00:00+00:00


HTMLParser 101

In the example in this section, we used the requests and lxml packages to make HTTP requests and parse the resulting content. But what if you are unable to install the packages and therefore must rely on the standard library? As we noted in the beginning of this chapter, you can use urllib for making your requests, but you’ll need to set up your own parser with the standard library html.parser.HTMLParser.

There are three primary methods you can implement when using the HTMLParser class: handle_starttag, handle_endtag, and handle_data. The handle_starttag function will be called anytime an opening HTML tag is encountered, and the opposite is true for the handle_endtag function, which gets called each time a closing HTML tag is encountered. The handle_data function gets called when there is raw text between tags. The function prototypes for each function are slightly different, as follows:

handle_starttag(self, tag, attributes) handle_endttag(self, tag) handle_data(self, data)

Here’s a quick example to highlight this:

<title>Python rocks!</title> handle_starttag => tag variable would be "title" handle_data => data variable would be "Python rocks!" handle_endtag => tag variable would be "title"

With this very basic understanding of the HTMLParser class, you can do things like parse forms, find links for spidering, extract all of the pure text for data-mining purposes, or find all of the images in a page.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(156734)
Eco-friendly approach of bio-indigo synthesis and developing purification methods towards isolation of indigo from indirubin and bacterial fragments by Ramalingam Manivannan & Kaliyan Prabakaran & Young-A Son(155608)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74275)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(66079)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(65827)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(65265)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50858)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(39140)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(39132)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(39002)
What's Done in Darkness by Kayla Perrin(27103)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26484)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26435)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(21017)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(20775)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(20648)
The Fifty Shades Trilogy & Grey by E L James(19605)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19487)
Shot Through the Heart by Mercy Celeste(19349)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17491)