Data Tidying and Importing with R

Instructor: Dr. Elijah Meyer , Mine Çetinkaya-Rundel

Beginner Level • 1 week to complete at 10 hours a week • Flexible Schedule

What You'll Learn

  • Apply tidy data principles to manipulate and restructure data (e.g., subsetting, adding columns, and transforming data between wide and long formats)
  • Develop and implement code to join data sets and perform basic web scraping to collect data
  • Apply data structures such as wide and long formats, using code to convert between these formats as part of data preparation and analysis

Skills You'll Gain

Data Cleansing
Data Literacy
Data Ethics
Data Transformation
Data Import/Export
Exploratory Data Analysis
R Programming
Tidyverse (R Package)
Web Scraping
Data Wrangling
Data Manipulation

Shareable Certificate

Earn a shareable certificate to add to your LinkedIn profile

Outcomes

  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 3 modules in this course

Tidy datasets have a specific structure: each variable is a column, and each observation is a row. In this module, we use functional verbs from the dplyr package in R to transform data into a ready-to-use tidy data format. Additionally, we use functional verbs to manipulate data frames.

A column in our data set can be stored as many different types, such as numbers or characters. These different data types inform how R treats the data, and whether certain functions are compatible to use with certain types of data. In this module, we discuss more in detail, the different data types classified by R, data classes, as well as how to recode variables in a data set to be different types, classes, or take on different values.

Web scraping is the process of extracting this information automatically and transforming it into a structured dataset. In this module, we go over how to perform basic web scraping in R to make an abundance of data online more easily accessible.