Website Scraping with Python: Using BeautifulSoup and Scrapy by Gábor László Hajba

Website Scraping with Python: Using BeautifulSoup and Scrapy by Gábor László Hajba

Author:Gábor László Hajba
Language: eng
Format: epub, pdf
Publisher: Apress
Published: 2018-09-24T16:00:00+00:00


Extension

Extensions are singleton classes that get instantiated once at startup and contain custom code, which you can use to add some custom functionality that is not related to downloading or scraping like a middleware does. Such extensions can be used for logging, or monitoring memory consumption (these are already built-in extensions).

Extensions can be loaded the same way as middlewares and pipelines in settings.py.EXTENSIONS = {

'scrapy.extensions.memusage.CoreStats': 500

}

Selectors

This is the most important term you will encounter while using Scrapy. Selectors are the code parts that select certain parts of the HTML. As you can see, selectors work similar to Beautiful Soup and lxml but they are the Scrapy version, and you can use XPath or CSS expressions. I prefer XPath expressions because I worked for years with XML and XML transformations; therefore, I know XPath expression well. You are free to use any approach, but I will stick to XPath.

Selectors are objects in Scrapy, and because of this they can be constructed from a text.from scrapy.selector import Selector



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.