Workshop in Cape Town: Web Scraping with R

Share Tweet

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.

Who should attend?

This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.

What will you learn?

You will learn:

  • data manipulation with dplyr, tidyr and purrr;
  • tools for accessing the DOM;
  • scraping static sites with rvest;
  • scraping dynamic sites with RSelenium; and
  • setting up an automated scraper in the cloud.

See programme below for further details.

Where Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town
When 14-15 June 2018
Who Andrew Collier
Hanjo Odendaal

There are just 20 seats available. A 10% discount is available for groups of 4 or more people from a single organisation attending both days.

Email [email protected] if you have any questions about the workshop.

Register

Programme

Day 1

  • Motivating Example
  • R and the tidyverse
    • Vectors, Lists and Data Frames
    • Loading data from a file
    • Manipulating Data Frames with dplyr
    • Pivoting with tidyr
    • Functional programming with purrr
  • Introduction to scraping
    • Ethics
    • DOM
    • Developer Tools
    • CSS and XPath
    • robots.txt and site map
  • Scraping a static site with rvest
    • What happens under the hood
    • What the hell is curl?
    • Assisted Assignment: Movie information from IMDB

Day 2

  • Case Study: Investigating drug tests using rvest
  • Interacting with APIs
    • Using XHR to find an API
    • Building wrappers around APIs
  • Scraping a dynamic site with RSelenium
    • Why RSelenium is needed
    • Navigation around web-pages
    • Combining RSelenium with rvest
    • Useful JavaScript tools
    • Case Study
  • Deploying a Scraper in the Cloud
    • Launching and connecting to an EC2 instance
    • Headless browsers
    • Automation with cron

Register

Share Tweet



Related articles


0 Comments