Tidy Data
- Pre-workshop activities: 0-15 min
- Introductory presentation: 60 min
- Hands-on activities: 60-90 min
Why Tidy Data?
Tidy Data is data that is stored and formatted in a way that facilitates data analysis.
Do you collect or manage quantitative data, but struggle with how to structure your tables? The principles of “tidy data” are a standard way to organize the values within a dataset. By keeping your data tidy, you can spend less time on wrangling data, and more time getting the answers you need!
This workshop provides participants with a framework to recognize and tidy messy data, and some practical skills in Microsoft Excel and Microsoft Power Query to enable data collection in a tidy format, so that analysis is possible!
This workshop was designed as a part of the “Demystifying Library Assessment: Professional Development to Expand Skills and Improve Practices” program that seeks to improve data quality and analysis in the Libraries. Tidying data is possible through many other tools such as the reshape2 and plr R packages which are mentioned in the Tidy Data Article by Hadley Wickham (1); however, Excel with Power Query is familiar and available to Uvic affiliates.
Learning objectives
At the end of this workshop, you will be able to:
- Understand why tidy data is needed for data analysis
- Define an observation, a variable, tidy data, and messy data
- Recognize a dataset as either messy or tidy
- Name and identify the 5 most common messy datasets
- Apply the techniques to tidy the 5 most common messy data problems using Excel and Power Query
References
(1) Wickham, H. . (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
NEXT STEP: Pre-workshop Activities