Hands-on dplyr tutorial for faster data manipulation in R

Watch the follow-up tutorial: http://youtu.be/2mh1PqfsXVI
View the R Markdown document: http://rpubs.com/justmarkham/dplyr-tutorial
Download the source document: https://github.com/justmarkham/dplyr-tutorial
Read about why I love dplyr: http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/

dplyr is a new R package for data manipulation. Using a series of examples on a dataset you can download, this tutorial covers the five basic dplyr “verbs” as well as a dozen other dplyr functions.

Tutorial contents:
1. Introduction to dplyr (starts at 0:00)
2. Loading dplyr and the example dataset (starts at 2:29)
3. Understanding “local data frames” (starts at 3:23)
4. Verb #1: `filter` (starts at 5:17)
5. Verb #2: `select`, plus `contains`, `starts_with`, `ends_with`, `matches` (starts at 7:54)
6. Using chaining syntax for more readable code (starts at 9:34)
7. Verb #3: `arrange` (starts at 12:53)
8. Verb #4: `mutate` (starts at 13:55)
9. Verb #5: `summarise`, plus `group_by`, `summarise_each`, `n`, `n_distinct`, `tally` (starts at 15:31)
10. Window functions: `min_rank`, `top_n`, `lag` (starts at 26:47)
11. Convenience functions: `sample_n`, `sample_frac`, `glimpse` (starts at 32:44)
12. Connecting to databases (starts at 34:21)

== RESOURCES ==

Reference manual and vignettes: http://cran.r-project.org/web/packages/dplyr/index.html
July 2014 webinar: http://pages.rstudio.net/Webinar-Series-Recording-Essential-Tools-for-R.html
July 2014 webinar code: https://github.com/rstudio/webinars/tree/master/2014-01
Tutorial by Hadley Wickham: https://www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a
GitHub repo: https://github.com/hadley/dplyr
List of releases: https://github.com/hadley/dplyr/releases

== LET’S CONNECT! ==

Blog: http://www.dataschool.io
Newsletter: http://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
GitHub: https://github.com/justmarkham

Comments

Rob van Mechelen says:

Excellent! Thank you

Jaded Hackneyed says:

Your tutorial is meticulous, clear and useful for those who are used to basic R approach but feels a need to learn dplyr package. This not lengthy video does help me to write R-code in an efficient and convenient manner. Thanks.

oregono says:

So question on the group_by:

I ran the following from an csv file referenced and it gives me aggregates – but they are all the exact same number 😐

exceldata %>%
group_by(Ability.I,Type.I) %>%
summarize(test1=mean(exceldata$HP,na.rm=TRUE))

Punit kaur says:

38 mins well spent thanks for an awsum tutorial!!!

Dheeru Kura says:

Explanation was awesome. It’s changed n improved my perception towards Rstudio

Kumar Siddhartha says:

Hey, can you make a video on a reporting package or multiple packages which can depict the data base or simply a table however I want. Like column names containing formula and things like that. In short, I want a reporting package which allows me to manipulate the table and contents, which I report, as much as possible. Thanks.

anto cdt says:

THKS !

Leore Lavin says:

Really clear and informative, thank you!

Jiawei Hugo Zhou says:

in the video, summarise_each() is deprecated as of 2017, guys can use the code below

flights %>%
group_by(UniqueCarrier) %>%
summarise_at(.var= c(“Cancelled”,”Diverted”),.funs=mean)

jordan Ndetcho says:

Very easy to understand, straight to the point and useful for beginners like me ^^
thank you very much !

AK2016 says:

Very good! Thank you.

John K says:

very helpful

Brenden Morley says:

THank you …. very detailed and informative

AliendaroN says:

Great Video, thank you a lot 😉

David Juarez says:

Was dplyr replaced by another package?

Bryan Wu says:

This is great! I really love the fact that you show the “base R” approach as a comparison. Looking forward to more vids.

Papercraftfreak3 says:

This tutorial is the most helpful resource I’ve found thus far for my Data Science project. Thank you so much for posting!!

HP says:

Thank you! Very clear and helpful.

Prasad Vittala says:

hii , is there any necessity to create a local data frame from the original data frame?

Robert Noble says:

Is there an AND/OR condition that you can use ?
,|
Would that work?

I merged two huge data frames. Some of them have the same name for a variable even though they are different. They are H.x (hits for batting), H.y (Hits for pitching), for instance, but other variables are exactly the same. Example: Year.x (the year a player played for hitting), Year.y (the year a player pitched). When working with that huge data frame, I would not want to confuse Hits off of a pitcher and its that a batter made because they are two distinct variables. But for a the years that a player played, I would not want to miss any rows (player) for any given year.

Would that be And, Or, And/Or (if that exists) or wouldn’t it matter? I think Or could work as a useful way but is that the best choice. It seems as though it should matter. Thanks.

Nitin sethi says:

Need this dataset…

nakul menon says:

That was a well spent 40 minutes. Very neat, precise, and easy to understand. You have my additional gratitude for comparing the dpylr to the base R functions, which helped in visualizing why dpylr is practical. Thank You!

Amey n says:

Excellent and very easy to understand

Enzo C. says:

Great, helps a lot.. Thanks!

christopher guth says:

awesome tutorial man very helpfull and easy to understand !

Ashok Anumandla says:

Great video on dplyr. Helps a lot for data manipulations

John Sheehan says:

Thank you for putting these tutorials together. They are FANTASTIC for the R newbie. And I particularly love that you have the R Markdown version that we can keep for reference.

Write a comment

*

Human Verification: In order to verify that you are a human and not a spam bot, please enter the answer into the following box below based on the instructions contained in the graphic.


Do you like our videos?
Do you want to see more like that?

Please click below to support us on Facebook!

Send this to a friend

▷ Other ReviewsVehicles▷ Show Cars▷ Motorbikes▷ Scooters▷ Bicycles▷ Rims & Tires▷ Luxury BoatsFashion▷ Sunglasses▷ Luxury Watches▷ Luxury Purses▷ Jeans Wear▷ High Heels▷ Kinis Swimwear▷ Perfumes▷ Jewellery▷ Cosmetics▷ Shaving Helpers▷ Fashion HatsFooding▷ Chef Club▷ Fooding Helpers▷ Coktails & LiquorsSports▷ Sport Shoes▷ Fitness & Detox▷ Golf Gear▷ Racquets▷ Hiking & Trek Gear▷ Diving Equipment▷ Ski Gear▷ Snowboards▷ Surf Boards▷ Rollers & SkatesEntertainment▷ DIY Guides▷ Zik Instruments▷ Published Books▷ Music Albums▷ Cine Movies▷ Trading Helpers▷ Make Money▷ Fishing Equipment▷ Paintball Supplies▷ Trading Card Games▷ Telescopes▷ Knives▷ VapesHigh Tech▷ Flat Screens▷ Tech Devices▷ Camera Lenses▷ Audio HiFi▷ Printers▷ USB Devices▷ PC Hardware▷ Network Gear▷ Cloud Servers▷ Software Helpers▷ Programmer Helpers▷ Mobile Apps▷ Hearing AidsHome▷ Home Furniture▷ Home Appliances▷ Tools Workshop▷ Beddings▷ Floor Layings▷ Barbecues▷ Aquarium Gear▷ Safe Boxes▷ Office Supplies▷ Security Locks▷ Cleaning ProductsKids▷ Baby Strollers▷ Child Car Seats▷ Remote ControlledTravel▷ Luggages & Bags▷ Airlines Seats▷ Hotel Rooms▷ Fun Trips▷ Cruise Ships▷ Mexico Tours