This data project has been used as a take-home assignment in the recruitment process for the data science positions at 23andMe.
Assignment
Please answer the questions below based on the data provided:
Plot daily sales for all 50 weeks.
It looks like there has been a sudden change in daily sales. What date did it occur?
Is the change in daily sales at the date you selected statistically significant? If so, what is the p-value?
Does the data suggest that the change in daily sales is due to a shift in the proportion of male-vs-female customers? Please use plots to support your answer (a rigorous statistical analysis is not necessary).
Assume a given day is divided into four dayparts:
night (12:00AM – 6:00AM),
morning (6:00AM – 12:00PM),
afternoon (12:00PM – 6:00PM),
evening (6:00PM – 12:00AM).
What is the percentage of sales in each daypart over all 50 weeks?
Data Description
The datasets/ directory contains fifty CSV files (one per week) of timestamped sales data. Each row in a file has two columns:
sale_time – The timestamp on which the sale was made e.g. 2012-10-01 01:42:22
purchaser_gender – The gender of the person who purchased (male or female)
Practicalities
Please work on the questions in the displayed order. Make sure that the solution reflects your entire thought process – it is more important how the code is structured rather than the final answers. You are expected to spend no more than 1-2 hours solving this project.