Going deeper with dplyr: New features in 0.3 and 0.4 (video tutorial)
In August 2014, I created a 40-minute video tutorial introducing the key functionality of the dplyr package in R. dplyr continues to be my "go-to" package for data exploration and manipulation because of its intuitive syntax, blazing fast performance, and excellent documentation.
I recorded that tutorial using the latest version at the time (0.2), but there have since been two significant updates to dplyr (versions 0.3 and 0.4). Because those updates introduced a ton of new functionality, I thought it was time to create another tutorial!
This new tutorial covers the most useful new features in 0.3 and 0.4, as well as some advanced functionality from previous versions that I didn't cover last time. (If you have not watched the previous tutorial, I recommend you do so first since it covers some dplyr basics that are not covered in this tutorial.)
Table of contents
This new tutorial runs 37 minutes, but if you only want to watch a particular section, simply click the topic below and it will skip to that point in the video:
- Introduction (starts at 0:00)
- Loading dplyr and the nycflights13 dataset (starts at 1:12)
- Choosing columns:
rename(starts at 2:28)
- Choosing rows:
distinct(starts at 5:40)
- Adding new variables:
add_rownames(starts at 12:38)
- Grouping and counting:
ungroup(starts at 15:20)
- Creating data frames:
data_frame(starts at 23:01)
- Joining (merging) tables:
anti_join(starts at 25:28)
- Viewing more output:
View(starts at 31:29)
- Resources (starts at 34:41)
The video is embedded below, or you can view it on YouTube:
You can view the R Markdown document used in the video on RPubs, or you can download the source document from GitHub.
Here are the resources I mention in the video:
- Release announcements for version 0.3 and version 0.4
- dplyr reference manual and vignettes
- Two-table vignette covering joins and set operations
- RStudio's Data Wrangling Cheat Sheet for dplyr and tidyr
- dplyr GitHub repo and list of releases
My previous tutorial is embedded below, or you can view it on YouTube:
If you have any questions about dplyr, I'd love to hear them in the comments!
If you'd like to be notified when I release new videos, please subscribe to my YouTube channel. I also blog about a wide variety of data science topics, and have an email newsletter if you'd like to hear about that content!
Just released! 37-min tutorial on new features in dplyr 0.3 & 0.4: rename, slice, count, data_frame, joins, much more http://t.co/BoQXEvo81n— Kevin Markham (@justmarkham) March 9, 2015