Home > Computers & Technology > Business Technology

Web Data Mining with Python: Discover and extract information from the web using Python by Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava

Author:Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava [Rajnish, Dr. Ranjana & Srivastava, Dr. Meenakshi] , Date: February 2, 2023 ,Views: 323

Web Data Mining with Python: Discover and extract information from the web using Python by Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava

Author:Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava [Rajnish, Dr. Ranjana & Srivastava, Dr. Meenakshi]
Language: eng
Format: epub
ISBN: 9789355513663
Publisher: BPB Publications
Published: 2023-02-15T00:00:00+00:00

Handling images

Many times we may need to scrape images (maybe for preparing the dataset or for any other purpose); we can scrap the images from any Web page and store them in our hard drive. We will now see how Python (BeautifulSoup) can be used to scrape the images using the following code. We are using Web page https://rubikscode.net/ to extract all images.

Example 1:

1. import requests

2. from bs4 import BeautifulSoup

3. import os

5. url='https://rubikscode.net/'

7. ur=requests.get(url)

9. soup=BeautifulSoup(ur.text, 'html.parser')

10.

11. images=soup.find_all('img')

12.

13. for image in images:

14. print(image['src'])

15.

To do so, we will import Beautiful Soup from the requests library. We are also importing âosâ module as we need to store the images in the hard drive. âosâ module is used whenever the code needs to interact with the underlying operating system. In Line 5, we are storing the URL of the website from which images need to be scraped into the variable âurlâ. Then in Line 7, we pass this âurlâ to the get() method of ârequestsâ, to connect to and retrieve information from the given server using a given URL. This information is stored in the variable âurâ. At this stage, if you will get lots of information in the form of HTML text. Line 9 uses âBeautifulSoupâ to create a parse tree from the page source to extract data in a hierarchical and more readable manner, which is stored in âsoupâ variable (Conventionally, we use the âsoupâ variable name, but any name can be used). At this stage, if you print the value of âsoupâ variable, you will see the entire Web page with HTML tags. Now, we want only the images from the page, and we have seen that each image link is specified in <img> tag. So, using find_all(âimgâ) in Line 11, we find all image links from the soup object. We now iterate the list of links stored in âimagesâ using a for loop in Line 13 and print each link in Line 14. You will get the output as follows:

Download

Web Data Mining with Python: Discover and extract information from the web using Python by Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design