Together you’ll learn better thanks to my workshop Web Scraping and Crawling with Python. Do you want to know more about this workshop? Curious how I can adapt it to your needs? Something else? Don’t hesitate to contact me.
Together you’ll learn better thanks to my workshop Web Scraping and Crawling with Python. Do you want to know more about this workshop? Curious how I can adapt it to your needs? Something else? Don’t hesitate to contact me.
The internet is not just a collection of webpages, it’s a gigantic resource of interesting data. Being able to extract that data is a valuable skill. It’s certainly challenging, but with the right knowledge and tools, you’ll be able to leverage a wealth of information for your personal and professional projects.
Imagine building a web scraper that legally gathers information about potential houses to buy, a process that automatically fills in that tedious form to download a report, or a crawler that enriches an existing data set with weather information. In this hands-on workshop we’ll teach you how to accomplish just that using Python and a handful of packages.
You’ll learn about the concepts underlying HTML, CSS selectors, and HTTP requests; and how to inspect those using the developer tools of your browser. We’ll show you how to turn messy HTML into structured data sets, how to automate interacting with dynamic websites and forms, and how to set up crawlers that can traverse thousands or million of websites. Through plenty of exercises you’ll be able to apply this new knowledge to your own projects in no time.
beautifulsoup4
, pyquery
, scrapy
, and selenium
You’re expected to have some experience with programming in Python. Our workshop Introduction to Programming in Python is one option that can help you with that. Roughly speaking, if you’re familiar with the following Python syntax and concepts, then you’ll be fine:
bool
, int
, float
, list
, tuple
, dict
, str
, type castingin
operator, indexing, slicingif
, elif
, else
, for
, while
range()
, len()
, zip()
def
, (keyword) arguments, default valuesimport
, import as
, from import ...
Some experience with HTML and CSS is useful, but not required.
We’re going to use Python together with JupyterLab and the following packages:
beautifulsoup4
mechanize
pyquery
scrapy
, andselenium
The recommended way to get everything set up is to:
! conda install -y -c conda-forge beautifulsoup4 mechanize pyquery scrapy selenium
in a Jupyter notebookAlternatively, if you don’t want to use Anaconda, then you can install everything using pip
. In any case, if running import bs4, mechanize, pyquery, scrapy, selenium
doesn’t produce any errors then you know you’ve set up everything correctly.
In addition, you should have a recent version of either Firefox or Chrome because we’re going to use their Developer Tools to inspect HTTP requests and HTML elements.
Stay up-to-date about new workshops, upcoming events, and other news about myself and Data Science Workshops.
Do you want to know more about this workshop? Curious how I can adapt it to your needs? Something else? Send an email to jeroen