Hands-on web scraping with Python : extract quality data from the web using effective Python techniques /

Saved in:
Bibliographic Details
Author / Creator:Chapagain, Anish, author.
Edition:Second edition.
Imprint:Birmingham, UK : Packt Publishing Ltd., 2023.
Description:1 online resource (324 pages) : illustrations
Language:English
Subject:
Format: E-Resource Book
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/13712774
Hidden Bibliographic Details
ISBN:9781837636211
9781837638512
1837638519
Notes:Includes bibliographical references and index.
Summary:Web scraping is a powerful tool for extracting data from the web, but it can be daunting for those without a technical background. Designed for novices, this book will help you grasp the fundamentals of web scraping and Python programming, even if you have no prior experience. Adopting a practical, hands-on approach, this updated edition of Hands-On Web Scraping with Python uses real-world examples and exercises to explain key concepts. Starting with an introduction to web scraping fundamentals and Python programming, you'll cover a range of scraping techniques, including requests, lxml, pyquery, Scrapy, and Beautiful Soup. You'll also get to grips with advanced topics such as secure web handling, web APIs, Selenium for web scraping, PDF extraction, regex, data analysis, EDA reports, visualization, and machine learning. This book emphasizes the importance of learning by doing. Each chapter integrates examples that demonstrate practical techniques and related skills. By the end of this book, you'll be equipped with the skills to extract data from websites, a solid understanding of web scraping and Python programming, and the confidence to use these skills in your projects for analysis, visualization, and information discovery.
Other form:Print version: 1837636214 9781837636211
Table of Contents:
  • Cover
  • Title page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Python and Web Scraping
  • Chapter 1: Web Scraping Fundamentals
  • Technical requirements
  • What is web scraping?
  • Understanding the latest web technologies
  • HTTP
  • HTML
  • XML
  • JavaScript
  • CSS
  • Data-finding techniques used in web pages
  • HTML source page
  • Developer tools
  • Summary
  • Further reading
  • Chapter 2: Python Programming for Data and Web
  • Technical requirements
  • Why Python (for web scraping)?
  • Accessing the WWW with Python
  • Setting things up
  • Creating a virtual environment
  • Installing libraries
  • Loading URLs
  • URL handling and operations
  • requests
  • Python library
  • Implementing HTTP methods
  • GET
  • POST
  • Summary
  • Further reading
  • Part 2: Beginning Web Scraping
  • Chapter 3: Searching and Processing Web Documents
  • Technical requirements
  • Introducing XPath and CSS selectors to process markup documents
  • The Document Object Model (DOM)
  • XPath
  • CSS selectors
  • Using web browser DevTools to access web content
  • HTML elements and DOM navigation
  • XPath and CSS selectors using DevTools
  • Scraping using lxml
  • a Python library
  • lxml by example
  • Web scraping using lxml
  • Parsing robots.txt and sitemap.xml
  • The robots.txt file
  • Sitemaps
  • Summary
  • Further reading
  • Chapter 4: Scraping Using PyQuery, a jQuery-Like Library for Python
  • Technical requirements
  • PyQuery overview
  • Introducing jQuery
  • Exploring PyQuery
  • Installing PyQuery
  • Loading a web URL
  • Element traversing, attributes, and pseudo-classes
  • Iterating using PyQuery
  • Web scraping using PyQuery
  • Example 1
  • scraping book details
  • Example 2
  • sitemap to CSV
  • Example 3
  • scraping quotes with author details
  • Summary
  • Further reading
  • Chapter 5: Scraping the Web with Scrapy and Beautiful Soup
  • Technical requirements
  • Web parsing using Python
  • Introducing Beautiful Soup
  • Installing Beautiful Soup
  • Exploring Beautiful Soup
  • Web scraping using Beautiful Soup
  • Web scraping using Scrapy
  • Setting up a project
  • Creating an item
  • Implementing the spider
  • Exporting data
  • Deploying a web crawler
  • Summary
  • Further reading
  • Part 3: Advanced Scraping Concepts
  • Chapter 6: Working with the Secure Web
  • Technical requirements
  • Exploring secure web content
  • Form processing
  • Cookies and sessions
  • User authentication
  • HTML processing using Python
  • User authentication and cookies
  • Using proxies
  • Summary
  • Further reading
  • Chapter 7: Data Extraction Using Web APIs
  • Technical requirements
  • Introduction to web APIs
  • Types of API
  • Benefits of web APIs
  • Data formats and patterns in APIs
  • Example 1
  • sunrise and sunset
  • Example 2
  • GitHub emojis
  • Example 3
  • Open Library
  • Web scraping using APIs
  • Example 1
  • holidays from the US calendar
  • Example 2
  • Open Library book details
  • Example 3
  • US cities and time zones