It is often mentioned that 80% of a data analysis pipeline is involved with the tedious process of cleaning and preparing data in a correct way so they can be consumed for analysis and visualization (Dasu & Johnson, 2003). Tidy data facilitates easier data transformation and visualization. Tidy data works hand in hand with the tools provided by the tidyverse collection of R packages, in a way that promotes reproducibility and efficiency. ggplot2 (Wickham, 2009) is one of the core members of the tidyverse. It is one of the best and most used R packages for data visualization. In this workshop, participants will learn the principle of tidy data, how to transform and combine datasets using the tools from the tidyverse and how to generate advanced visualization with the ggplot2 package.
Dasu, T., & Johnson, T. (2003). Exploratory Data Mining and Data Cleaning. https://doi.org/10.1002/0471448354 Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Retrieved from http://ggplot2.org.
Is this workshop for me?
- Do you routinely spend long days transforming and cleaning your Excel files to get them ready for analysis and/or making plots?
- Do you work with tricky datafiles (measurements from different type of equipment, normally provided as raw text files)?
- Do you often struggle to make sense of big, complex datasets?
- Do you often have to combine different datasets in order to perform your research?
- Do you want to communicate your findings in a beautiful and reproducible way by generating publication-ready plots?
Then this workshop will equip you with the skills to tackle the above use-cases and many more!
Assumed knowledge
Participants should be familiar with the concepts taught in the course “Introduction to Data Science with R and R Studio” and be comfortable in working with:
- Vectors, Lists and Data.Frames
- Importing and saving data
- Using functions
Learning Outcomes
- Participants will be taught the principles of tidy data, how to best structure their data using the tools of the tidyverse and the concept of data analysis pipelines.
- Participants will learn the basics concepts of relational data and how to combine different datasets in a reproducible and efficient manner.
- Participants will learn the syntax and philosophy of the grammar of graphics as implemented by the ggplot2 package.
- Participants will learn how to make different types of visualizations using the ggplot2 package. They will be able to explore their own data sets using scatter plots, boxplots, bar charts, smooth fitted lines in scatter plots, etc.
- Participants will learn how to customize their figures to achieve publication-level quality, by adjusting the labels, legends, colors, and coordinate systems, among others.
- Participants will be introduced to interactive data visualization and visualization of geographical data using interfaces in R to state-of-the-art web technologies.
The workshop consist of interacting presentations that are often interrupted by short do it yourself periods during which the participants need to solve exercises of increasing complexity. The participants will spend at least half of the time on writing R code and thinking about data science problems.
In order to practice all the skills taught in the workshop, participants will have to solve a small project on the last day using a dataset from a real research project. The different challenges faced by the participants will then be discuss at the end of the course as a wrap-up.
| Target Group | The course is aimed at PhD candidates, postdocs, and academic staff |
| Group Size | 24 participants |
| Course duration | 4 days (from 9:00 to 17:00). Participants are expected to be present full time, only in urgent exceptions a few hours leave form the course is possible. |
| Prior knowledge | Participants should be familiar with all the concepts taught in the course “Introduction to R” offered by PE&RC |
| Lecturers |
|
| Estimated self-study/practice time | 2 hours. A week before the start of the course, a short meeting will be planned to set up the computers with the required software. |
FEE1 | |
| PE&RC/WIMEK/EPS/WASS/VLAG/WIAS PhD candidates with approved TSP and WU EngD candidates | €175,- |
| PE&RC postdocs and staff | €350,- |
| All other academic participants | €390,- |
| Non-academic participants | €740,- |
PE&RC Cancellation Conditions
IMPORTANT: ALWAYS read the Cancellation conditions for PE&RC courses and activities.
- Sanja Selakovic (PE&RC)
Email: sanja.selakovic@wur.nl
- PE&RC Office
Email: office.pe@wur.nl