Black Hat Python, 2nd Edition by Justin Seitz & Tim Arnold
Author:Justin Seitz & Tim Arnold [Justin Seitz]
Language: eng
Format: epub
Publisher: No Starch Press
Published: 2021-04-12T16:00:00+00:00
HTMLParser 101
In the example in this section, we used the requests and lxml packages to make HTTP requests and parse the resulting content. But what if you are unable to install the packages and therefore must rely on the standard library? As we noted in the beginning of this chapter, you can use urllib for making your requests, but youâll need to set up your own parser with the standard library html.parser.HTMLParser.
There are three primary methods you can implement when using the HTMLParser class: handle_starttag, handle_endtag, and handle_data. The handle_starttag function will be called anytime an opening HTML tag is encountered, and the opposite is true for the handle_endtag function, which gets called each time a closing HTML tag is encountered. The handle_data function gets called when there is raw text between tags. The function prototypes for each function are slightly different, as follows:
handle_starttag(self, tag, attributes) handle_endttag(self, tag) handle_data(self, data)
Hereâs a quick example to highlight this:
<title>Python rocks!</title> handle_starttag => tag variable would be "title" handle_data => data variable would be "Python rocks!" handle_endtag => tag variable would be "title"
With this very basic understanding of the HTMLParser class, you can do things like parse forms, find links for spidering, extract all of the pure text for data-mining purposes, or find all of the images in a page.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
What's Done in Darkness by Kayla Perrin(26531)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19013)
The Fifty Shades Trilogy & Grey by E L James(18968)
Shot Through the Heart by Mercy Celeste(18885)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(16992)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(16883)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(16810)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16706)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16334)
The Subtle Art of Not Giving a F*ck by Mark Manson(14270)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14074)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(13435)
Scorched Earth by Nick Kyme(12718)
Drei Generationen auf dem Jakobsweg by Stein Pia(10926)
Suna by Ziefle Pia(10850)
Scythe by Neal Shusterman(10275)
International Relations from the Global South; Worlds of Difference; First Edition by Arlene B. Tickner & Karen Smith(9482)
Successful Proposal Strategies for Small Businesses: Using Knowledge Management ot Win Govenment, Private Sector, and International Contracts 3rd Edition by Robert Frey(9318)
This is Going to Hurt by Adam Kay(9109)
