Data Visualization with Python and JavaScript, 2nd Edition by Unknown

Data Visualization with Python and JavaScript, 2nd Edition by Unknown

Author:Unknown
Language: eng
Format: epub


if prop.get('link'):

link_html = '/a'

# select the div with a property-code id

code_block = response.xpath('//*[@id="%s"]'%

(prop['code']))

# continue if the code_block exists

if code_block:

# We can use the css selector, which has superior class

# selection

values = code_block.css('.wikibase-snakview-value')

# the first value corresponds to the code property
# (e.g. '10 August 1879')

value = values[0]

prop_sel = value.xpath('.%s/text()'%link_html)

if prop_sel:

item[prop['name']] = prop_sel[0].extract()

yield item

Extracts the link to Wikidata identified in Figure 4-5.

Extract the wiki_code from the url, e.g.

http://wikidata.org/wiki/Q155525 → Q155525.

Uses the Wikidata link to generate a request with our spider’s

parse_wikidata as a callback to deal with the response.

These are the property codes we found earlier (see Figure 4-6), with

names corresponding to fields in our Scrapy item, NWinnerItem.

Those with a True link attribute are contained in <a> tags.

Finally we yield the item, which at this point should have all the target

data available from Wikipedia.

With our request chain in place, let’s check that the spider is scraping our

required data:

$ scrapy crawl nwinners_full

2021-... [scrapy] ... started (bot: nobel_winners)

...

2021-... [nwinners_full] DEBUG: Scraped from

<200 https://www.wikidata.org/wiki/Q155525>

{'born_in': '',

'category': u'Physiology or Medicine',

'date_of_birth': u'8 October 1927',

'date_of_death': u'24 March 2002',

'gender': u'male',

'link': u'http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein',

'name': u'César Milstein',

'country': u'Argentina',

'place_of_birth': u'Bahía Blanca',



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.