Before starting this assignment, please download
RStudio Desktop on your computer. Both are open-source and free to use.
Detailed installation instructions can be found here
To complete this assignment, students must download the
R notebook template and open the file in their RStudio application. Please click the button below to download the template.
After completing the assignment, please upload the template (
.Rmd file) to Blackboard as your submission.
This semester we will be working with a data set from the field of Human Resources Analytics.
Broadly speaking, this field is concerned with using employee data within a company to optimize objectives such as employee satisfaction, productivity, project management, and most commonly, avoiding employee attrition.
Ideally, companies would like to keep attrition rates (the proportion of employees leaving a company for other opportunities) as low as possible due to the variable costs and business disruptions that come with having to replace productive employees on short notice.
The objective of this project is to perform an exploratory data analysis on the
employee_data data set to uncover potential solutions for minimizing employee attrition rates.
employee_data data frame is loaded below and consists of 1,470 employee records for a U.S. based product company. The rows in this data frame represent the attributes of an employee at this company across the variables listed in the table below.
|left_company||Did the employee leave the company? (Yes/No)||Factor|
|department||Department within the company||Factor|
|job_level||Job Level (Associate - Vice President)||Factor|
|salary||Employee yearly salary (US Dollars)||Numeric|
|weekly_hours||Self-reported average weekly hours spent on the job (company survey)||Numeric|
|business_travel||Level of required business travel||Factor|
|yrs_at_company||Tenure at the company (years)||Numeric|
|yrs_since_promotion||Years since last promotion||Numeric|
|previous_companies||Number of previous companies for which the employee has worked||Numeric|
|job_satisfaction||Self-reported job satisfaction (company survey)||Factor|
|performance_rating||Most recent annual performance rating||Factor|
|marital_status||Marital status (Single, Married, or Divorced)||Factor|
|miles_from_home||Distance from employee address to office location||Numeric|
Executives at this company have hired you as a data science consultant to identify the factors that lead to employees leaving their company.
They would like for you to explore why employees are leaving their company and make recommendations on how to minimize this behavior.
You must think of at least 5 relevant questions that explore the relationship between
left_company and the other variables in the
employee_data data frame.
The goal of your analysis should be discovering which variables drive the differences between employees who do and do not leave the company.
You must answer each question and provide supporting data summaries with either a summary data frame (using
tidyr) or a plot (using
ggplot) or both.
In total, you must have a minimum of 3 plots and 3 summary data frames for the exploratory data analysis section. Among the plots you produce, you must have at least 3 different types (ex. box plot, bar chart, histogram, heat map, etc…)
Each question must be answered with supporting evidence from your tables and plots. See the example question below.
Is there a relationship between employees leaving the company and their current salary?
Answer: Yes, the data indicates that employees who leave the company tend to have lower salaries when compared to employees who do not. Among the 237 employees that left the company, the average salary was $76,625. This is over $20,000 less than the average salary of employees who did not leave the company.
Among the employees who did not leave the company, only 10% have a salary that is less than or equal to $60,000. When looking at employees who did leave the company, this increase to 34%.
ggplot(data = employee_data, aes(x = salary, fill = left_company)) + geom_histogram(aes(y = ..density..), color = "white", bins = 20) + facet_wrap(~ left_company, nrow = 2) + labs(title = "Employee Salary Distribution by Status (Left the Comapny - Yes/No)", x = "Salary (US Dollars", y = "Proportion of Employees")
Write an executive summary of your overall findings and recommendations to the executives at this company. Think of this section as your closing remarks of a presentation, where you summarize your key findings and make recommendations to improve HR processes at the company.
Your executive summary must be written in a professional tone, with minimal grammatical errors, and should include the following sections:
An introduction where you explain the business problem and goals of your data analysis
What problem(s) is this company trying to solve? Why are they important to their future success?
What was the goal of your analysis? What questions were you trying to answer and why do they matter?
What were the interesting findings from your analysis and why are they important for the business?
This section is meant to establish the need for your recommendations in the following section
Your recommendations to the company on how to reduce employee attrition rates
Each recommendation must be supported by your data analysis results
You must clearly explain why you are making each recommendation and which results from your data analysis support this recommendation
You must also describe the potential business impact of your recommendation:
Why is this a good recommendation?
What benefits will the business achieve?