Web Data Mining with Python: Discover and extract information from the web using Python by Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava
Author:Dr. Ranjana Rajnish & Dr. Meenakshi Srivastava [Rajnish, Dr. Ranjana & Srivastava, Dr. Meenakshi]
Language: eng
Format: epub
ISBN: 9789355513663
Publisher: BPB Publications
Published: 2023-02-15T00:00:00+00:00
Handling images
Many times we may need to scrape images (maybe for preparing the dataset or for any other purpose); we can scrap the images from any Web page and store them in our hard drive. We will now see how Python (BeautifulSoup) can be used to scrape the images using the following code. We are using Web page https://rubikscode.net/ to extract all images.
Example 1:
1. import requests
2. from bs4 import BeautifulSoup
3. import os
4.
5. url='https://rubikscode.net/'
6.
7. ur=requests.get(url)
8.
9. soup=BeautifulSoup(ur.text, 'html.parser')
10.
11. images=soup.find_all('img')
12.
13. for image in images:
14. print(image['src'])
15.
To do so, we will import Beautiful Soup from the requests library. We are also importing âosâ module as we need to store the images in the hard drive. âosâ module is used whenever the code needs to interact with the underlying operating system. In Line 5, we are storing the URL of the website from which images need to be scraped into the variable âurlâ. Then in Line 7, we pass this âurlâ to the get() method of ârequestsâ, to connect to and retrieve information from the given server using a given URL. This information is stored in the variable âurâ. At this stage, if you will get lots of information in the form of HTML text. Line 9 uses âBeautifulSoupâ to create a parse tree from the page source to extract data in a hierarchical and more readable manner, which is stored in âsoupâ variable (Conventionally, we use the âsoupâ variable name, but any name can be used). At this stage, if you print the value of âsoupâ variable, you will see the entire Web page with HTML tags. Now, we want only the images from the page, and we have seen that each image link is specified in <img> tag. So, using find_all(âimgâ) in Line 11, we find all image links from the soup object. We now iterate the list of links stored in âimagesâ using a for loop in Line 13 and print each link in Line 14. You will get the output as follows:
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Exploring Deepfakes by Bryan Lyon and Matt Tora(7331)
Robo-Advisor with Python by Aki Ranin(7231)
Offensive Shellcode from Scratch by Rishalin Pillay(5904)
Ego Is the Enemy by Ryan Holiday(4881)
Microsoft 365 and SharePoint Online Cookbook by Gaurav Mahajan Sudeep Ghatak Nate Chamberlain Scott Brewster(4609)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4412)
Python for ArcGIS Pro by Silas Toms Bill Parker(3978)
Elevating React Web Development with Gatsby by Samuel Larsen-Disney(3682)
Machine Learning at Scale with H2O by Gregory Keys | David Whiting(3395)
Learning C# by Developing Games with Unity 2021 by Harrison Ferrone(3249)
Speed Up Your Python with Rust by Maxwell Flitton(3208)
Liar's Poker by Michael Lewis(3191)
OPNsense Beginner to Professional by Julio Cesar Bueno de Camargo(3177)
Extreme DAX by Michiel Rozema & Henk Vlootman(3152)
Agile Security Operations by Hinne Hettema(3105)
Linux Command Line and Shell Scripting Techniques by Vedran Dakic and Jasmin Redzepagic(3095)
Essential Cryptography for JavaScript Developers by Alessandro Segala(3065)
Cryptography Algorithms by Massimo Bertaccini(2981)
AI-Powered Commerce by Andy Pandharikar & Frederik Bussler(2966)
