Hands-on web scraping with Python : extract quality data from the web using effective Python techniques /

Saved in:

Bibliographic Details
Author / Creator:	Chapagain, Anish, author.
Edition:	Second edition.
Imprint:	Birmingham, UK : Packt Publishing Ltd., 2023.
Description:	1 online resource (324 pages) : illustrations
Language:	English
Subject:	Data mining. Python (Computer program language) Data mining Python (Computer program language)
Format:	E-Resource Book
URL for this record:	http://pi.lib.uchicago.edu/1001/cat/bib/13712774

Hidden Bibliographic Details
ISBN:	9781837636211 9781837638512 1837638519
Notes:	Includes bibliographical references and index.
Summary:	Web scraping is a powerful tool for extracting data from the web, but it can be daunting for those without a technical background. Designed for novices, this book will help you grasp the fundamentals of web scraping and Python programming, even if you have no prior experience. Adopting a practical, hands-on approach, this updated edition of Hands-On Web Scraping with Python uses real-world examples and exercises to explain key concepts. Starting with an introduction to web scraping fundamentals and Python programming, you'll cover a range of scraping techniques, including requests, lxml, pyquery, Scrapy, and Beautiful Soup. You'll also get to grips with advanced topics such as secure web handling, web APIs, Selenium for web scraping, PDF extraction, regex, data analysis, EDA reports, visualization, and machine learning. This book emphasizes the importance of learning by doing. Each chapter integrates examples that demonstrate practical techniques and related skills. By the end of this book, you'll be equipped with the skills to extract data from websites, a solid understanding of web scraping and Python programming, and the confidence to use these skills in your projects for analysis, visualization, and information discovery.
Other form:	Print version: 1837636214 9781837636211

Table of Contents:

Cover
Title page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Python and Web Scraping
Chapter 1: Web Scraping Fundamentals
Technical requirements
What is web scraping?
Understanding the latest web technologies
HTTP
HTML
XML
JavaScript
CSS
Data-finding techniques used in web pages
HTML source page
Developer tools
Summary
Further reading
Chapter 2: Python Programming for Data and Web
Technical requirements
Why Python (for web scraping)?
Accessing the WWW with Python
Setting things up
Creating a virtual environment
Installing libraries
Loading URLs
URL handling and operations
requests
Python library
Implementing HTTP methods
GET
POST
Summary
Further reading
Part 2: Beginning Web Scraping
Chapter 3: Searching and Processing Web Documents
Technical requirements
Introducing XPath and CSS selectors to process markup documents
The Document Object Model (DOM)
XPath
CSS selectors
Using web browser DevTools to access web content
HTML elements and DOM navigation
XPath and CSS selectors using DevTools
Scraping using lxml
a Python library
lxml by example
Web scraping using lxml
Parsing robots.txt and sitemap.xml
The robots.txt file
Sitemaps
Summary
Further reading
Chapter 4: Scraping Using PyQuery, a jQuery-Like Library for Python
Technical requirements
PyQuery overview
Introducing jQuery
Exploring PyQuery
Installing PyQuery
Loading a web URL
Element traversing, attributes, and pseudo-classes
Iterating using PyQuery
Web scraping using PyQuery
Example 1
scraping book details
Example 2
sitemap to CSV
Example 3
scraping quotes with author details
Summary
Further reading
Chapter 5: Scraping the Web with Scrapy and Beautiful Soup
Technical requirements
Web parsing using Python
Introducing Beautiful Soup
Installing Beautiful Soup
Exploring Beautiful Soup
Web scraping using Beautiful Soup
Web scraping using Scrapy
Setting up a project
Creating an item
Implementing the spider
Exporting data
Deploying a web crawler
Summary
Further reading
Part 3: Advanced Scraping Concepts
Chapter 6: Working with the Secure Web
Technical requirements
Exploring secure web content
Form processing
Cookies and sessions
User authentication
HTML processing using Python
User authentication and cookies
Using proxies
Summary
Further reading
Chapter 7: Data Extraction Using Web APIs
Technical requirements
Introduction to web APIs
Types of API
Benefits of web APIs
Data formats and patterns in APIs
Example 1
sunrise and sunset
Example 2
GitHub emojis
Example 3
Open Library
Web scraping using APIs
Example 1
holidays from the US calendar
Example 2
Open Library book details
Example 3
US cities and time zones

Hands-on web scraping with Python : extract quality data from the web using effective Python techniques /

Similar Items