In this tutorial, we will cover the following topics:


  • Working with categorical data and factors using the forcats package
  • Creating and manipulating dates with the lubridate package
  • Manipulating text data with the stringr package
  • Making decisions in programming with the if_else() function
  • Iteration with the purrr package to apply functions to multiple elements of a list or data frame


Please click the button below to open an interactive version of all course R tutorials through RStudio Cloud.

Note: you will need to register for an account before opening the project. Please remember to use your GMU e-mail address.



Click the button below to launch an interactive RStudio environment using Binder.org. This will launch a pre-configured RStudio environment within your browser. Unlike RStudio cloud, this service has no monthly usage limits, but it may take up to 10 minutes to launch and you will not be able to save your work.


Binder



First let’s load the tidyverse and lubridate packages as well as the employee_data and home_sales tibbles into our R environment.


library(tidyverse)
library(lubridate)
employee_data <- read_rds(url('https://gmudatamining.com/data/employee_data.rds'))
home_sales <- read_rds(url('https://gmudatamining.com/data/home_sales.rds'))


Data

We will be working with the employee_data data set, where each row represents an employee who either did or did not resign from a company as well as their attributes and work history, and the home_sales data, where each row represents a real estate home sale in the Seattle area between 2014 and 2015.

Take a moment to explore these data sets below.



Employee Data

Seattle Home Sales