[ R tutorials ]
By Tal Galili
There are tons of resources to help you learn the different aspects of R, and as a beginner this can be overwhelming. It’s also a dynamic language and rapidly changing, so it’s important to keep up with the latest tools and technologies.
That’s why R-bloggers and DataCamp have worked together to bring you a learning path for R. Each section points you to relevant resources and tools to get you started and keep you engaged to continue learning. It’s a mix of materials ranging from documentation, online courses, books, and more.
Just like R, this learning path is a dynamic resource. We want to continually evolve and improve the resources to provide the best possible learning experience. So if you have suggestions for improvement please firstname.lastname@example.org with your feedback.
Getting started: The basics of R
The best way to learn R is by doing. In case you are just getting started with R, this free introduction to R tutorial by DataCamp is a great resource as well the successorIntermediate R programming (subscription required). Both courses teach you R programming and data science interactively, at your own pace, in the comfort of your browser. You get immediate feedback during exercises with helpful hints along the way so you don’t get stuck.
Another free online interactive learning tutorial for R is available by O’reilly’s code school website called try R. An offline interactive learning resource isswirl, an R package that makes if fun and easy to become an R programmer. You can take a swirl course by (i) installing the package in R, and (ii) selecting a course from the course library. If you want to start right away without needing to install anything you can also choose for the online version of Swirl.
There are also some very good MOOC’s available on edX and Coursera that teach you the basics of R programming. On edX you can find Introduction to R Programming by Microsoft, an 8 hour course that focuses on the fundamentals and basic syntax of R. At Coursera there is the very popular R Programming course by Johns Hopkins. Both are highly recommended!
If you instead prefer to learn R via a written tutorial or book there is plenty of choice. There is the introduction to R manual by CRAN, as well as some very accessible books like Jared Lander’s R for Everyone or R in Action by Robert Kabacoff.
Setting up your machine
You can download a copy of R from the Comprehensive R Archive Network (CRAN). There are binaries available for Linux, Mac and Windows.
Once R is installed you can choose to either work with the basic R console, or with an integrated development environment (IDE). RStudio is by far the most popular IDE for R and supports debugging, workspace management, plotting and much more (make sure to check out the RStudio shortcuts).
R packages are the fuel that drive the growth and popularity of R. R packages are bundles of code, data, documentation, and tests that are easy to share with others. Before you can use a package, you will first have to install it. Some packages, like the base package, are automatically installed when you install R. Other packages, like for example the ggplot2 package, won’t come with the bundled R installation but need to be installed.
Many (but not all) R packages are organized and available from CRAN, a network of servers around the world that store identical, up-to-date, versions of code and documentation for R. You can easily install these package from inside R, using the install.packages function. CRAN also maintains a set of Task Views that identify all the packages associated with a particular task such as for example TimeSeries.
Next to CRAN you also have bioconductor which has packages for the analysis of high-throughput genomic data, as well as for example the github andbitbucket repositories of R package developers. You can easily install packages from these repositories using the devtools package.
Finding a package can be hard, but luckily you can easily search packages from CRAN, github and bioconductor using Rdocumentation, inside-R, or you can have a look at this quick list of useful R packages.
To end, once you start working with R, you’ll quickly find out that R package dependencies can cause a lot of headaches. Once you get confronted with that issue, make sure to check out packrat (see video tutorial) or checkpoint. When you’d need to update R, if you are using Windows, you can use the updateR() function from the installr package.
The data you want to import into R can come in all sorts for formats: flat files, statistical software files, databases and web data.
Getting different types of data into R often requires a different approach to use. To learn more in general on how to get different data types into R you can check out this online Importing Data into R tutorial (subscription required), this post on data importing, or this webinar by RStudio.
- Flat files are typically simple text files that contain table data. The standard distribution of R provides functionality to import these flat files into R as a data frame with functions such as read.table() andread.csv() from the utils package. Specific R packages to import flat files data are readr, a fast and very easy to use package that is less verbose as utils and multiple times faster (more information), and data.table’sfread() function for importing and munging data into R (using the fread function).
- In case you want to get your excel files into R, it’s a good idea to have a look at the readxl package. Alternatively, there is the gdata package which has function that supports the import of Excel data, and the XLConnect package. The latter acts as a real bridge between Excel and R meaning you can do any action you could do within Excel but you do it from inside R. Read more on importing your excel files into R.
- Software packages such as SAS, STATA and SPSS use and produce their own file types. The haven package by Hadley Wickham can deal with importing SAS, STATA and SPSS data files into R and is very easy to use. Alternatively there is the foreign package, which is able to import not only SAS, STATA and SPSS files but also more exotic formats like Systat and Weka for example. It’s also able to export data again to various formats. (Tip: if you’re switching from SAS,SPSS or STATA to R, check out Bob Muenchen’s tutorial (subscription required))
- The packages used to connect to and import from a relational database depend on the type of database you want to connect to. Suppose you want to connect to a MySQL database, you will need the RMySQL package. Others are for example the RpostgreSQL and ROracle package.The R functions you can then use to access and manipulate the database, is specified in another R package called DBI.
- If you want to harvest web data using R you need to connect R to resources online using API’s or through scraping with packages like rvest. To get started with all of this, there is this great resource freely available on the blog of Rolf Fredheim.
Turning your raw data into well structured data is important for robust analysis, and to make data suitable for processing. R has many built-in functions for data processing, but they are not always that easy to use. Luckily, there are some great packages that can help you:
- The tidyr package allows you to “tidy” your data. Tidy data is data where each column is a variable and each row an observation. As such, it turns your data into data that is easy to work with. Check this excellent resource on how you can tidy your data using tidyr.
- If you want to do string manipulation, you should learn about thestringr package. The vignette is very understandable, and full of useful examples to get you started.
- dplyr is a great package when working with data frame like objects (in memory and out of memory). It combines speed with a very intuitive syntax. To learn more on dplyr you can take this data manipulation course (subscription required) and check out this handy cheat sheet.
- When performing heavy data wrangling tasks, the data.table package should be your “go-to”package. It’s blazingly fast, and once you get the hang of it’s syntax you will find yourself using data.table all the time.Check this data analysis course (subscription required) to discover the ins and outs of data.table, and use this cheat sheet as a reference.
- Chances are you find yourself working with times and dates at some point. This can be a painful process, but luckily lubridate makes it a bit easier to work with. Check it’s vignette to better understand how you can use lubridate in your day-to-day analysis.
- Base R has limited functionality to handle time series data. Fortunately, there are package like zoo, xts and quantmod. Take this tutorial by Eric Zivot to better understand how to use these packages, and how to work with time series data in R.
If you want to have a general overview of data manipulation with R, you can read more in the book Data Manipulation with R or see the Data Wrangling with R video by RStudio. In case you run into troubles with handling your data frames, check 15 easy solutions to your data frame problems.
One of the things that make R such a great tool is its data visualizations capabilities. For performing visualizations in R, ggplot2 is probably the most well known package and a must learn for beginners! You can find all relevant information to get you started with ggplot2 onhttp://ggplot2.org/ and make sure to check out the cheatsheet and the upcomming book. Next to ggplot2, you also have packages such as ggvis for interactive web graphics (seetutorial (subscription required)), googleVis to interface with google charts (learn to re-create this TED talk), Plotly for R, and many more. See the task view for some hidden gems, and if you have some issues with plotting your datathis post might help you out.
In R there is a whole task view dedicated to handling spatial data that allow you to create beautiful maps such as this famous one:
To get started look at for example a package such as ggmap, which allows you to visualize spatial data and models on top of static maps from sources such as Google Maps and Open Street Maps. Alternatively you can start playing around with maptools, choroplethr, and the tmap package. If you need a great tutorial take this Introduction to visualising spatial data in R.
You’ll often see that visualizations in R make use of all these magnificent color schemes that fit like a glove on the graph/map/… If you want to achieve this for your visualizations as well, then deepen yourself into the RColorBrewer package and ColorBrewer.
One of the latest visualizations tools in R is HTML widgets. HTML widgets work just like R plots but they create interactive web visualizations such as dynamic maps (leaflet), time-series data charting (dygraphs), and interactive tables (DataTables). There are some very nice examples of HTML widgets in the wild, and solid documentation on how to create your own one (not in a reading mode: just watch this video).
If you want to get some inspiration on what visualization to create next, you can have a look at blogs dedicated to visualizations such as FlowingData.
There are many beginner resources on how to do data science with R. A list of available online courses:
- Andrew Conway’s Introduction to statistics with R (subscription required)
- Data Analysis and Statistical Inference
- Data Analysis for life sciences
- Data Science Specialization by Johns Hopkins
Alternatively, if you prefer a good read:
- Practical Data Science With R
- R for Data Science (upcomming, see progress)
- A Survival Guide to Data Science with R
Once your start doing some machine learning with R, you will quickly find yourself using packages such as caret, rpart and randomForest. Luckily, there are some great learning resources for these packages and Machine Learning in general. If you are just getting started,this guide will get you going in no time. Alternatively, you can have a look at the books Mastering Machine Learning with R and Machine Learning with R. If you are looking for some step-by-step tutorials that guide you through a real life example there is the Kaggle Machine Learning course or you can have a look at Wiekvoet’s blog.
R Markdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R. It is a great tool for reporting your data analysis in a reproducible manner, thereby making the analysis more useful and understandable. R markdown is based on knitr and pandoc. With R markdown, R generates a final document that replaces the R code with its results. This document can be in an html, word, pfd, ioslides, etc. format. You can even create interactive R markdown documents using Shiny. This 4 hour tutorial onReporting with R Markdown (subscription required) get’s you going with R markdown, and in addition you can use this nice cheat sheet for future reference.
Other than its free courses, DataCamp also offers access to all of its advance R courses for $25/month, these include:
- R for SAS, SPSS and STATA Users
- A Hands-on Introduction to Statistics with R
- Intermediate R
- Importing Data Into R
- Data Manipulation in R with dplyr
- Data Analysis in R, the data.table Way
- Data Visualization in R with ggvis
- Data Visualization with ggplot2 (1)
- Data Visualization with ggplot2 (2)
- Introduction to Machine Learning
- Reporting with R Markdown
Another company is Udemy. While they do not offer video + interactive sessions like DataCamp, they do offer extensive video lessons, covering some other topics in using R and learning statistics. For readers of R-bloggers, Udemy is offering access to its courses for $15-$30 per course, use the codeRBLOGGERS30 for an extra 30% discount. Here are some of their courses:
- The Comprehensive Programming in R Course (25 Hours of video)
- Graphs in R (ggplot2, plotrix, base R) – Data Visualization with R Programming Language (5 Hours of video)
- Linear Mixed-Effects Models with R (11 Hours of video)
- Multivariate Data Visualization with R (7 Hours of video)
- Applied Multivariate Analysis with R (13 Hours of video)
- More Data Mining with R (11 Hours of video)
- Text Mining, Scraping and Sentiment Analysis with R (4 Hours of video)
- R Programming for Simulation and Monte Carlo Methods (12 Hours of video)
- Programming Statistical Applications in R (12 Hours of video)
- Comprehensive Linear Modeling with R (15 Hours of video)
- Bayesian Computational Analyses with R (12 Hours of video)
- Time Series Analysis and Forecasting in R (3 Hours of video)
Statistics.com is an online learning website with 100+ courses in statistics, analytics, data mining, text mining, forecasting, social network analysis, spatial analysis, etc.
They have kindly agreed to offer R-Bloggers readers a reduced rate of $399 for any of their 23 courses in R, Python, SQL or SAS. These are high-impact courses, each 4-weeks long (normally costing up to $589). They feature hands-on exercises and projects and the opportunity to receive answers online from leading experts like Paul Murrell (member of the R core development team), Chris Brunsdon (co-developer of the GISTools package), Ben Baumer (former statistician for the NY Mets baseball team), and others. These instructors will answer all your questions (via a private discussion forum) over a 4-week period.
You may use the code “R-Blogger16″ when registering. You can register for any R, Python, Hadoop, SQL or SAS course starting on any date. Here is a list of theR related courses:
Using R as a statistical package
- R for Statistical Analysis
- Modeling in R
- Visualization in R using ggplot2
- Graphics in R
- Logistic Regression
- Bayesian Statistics in R
Building R programming skills – for those familiar with R, or experienced with other programming languages or statistical computing environments
- R Programming – Intermediate One year of daily R use required before taking this course
- R Programming – Advanced Two years of daily R use required before taking this course
Applying R to specific domains or applications
- Applied Predictive Analytics
- SQL and R – Introduction to Database Queries
- Biostatistics in R with Clinical Trial Applications
- Data Mining in R
- Mapping in R
- Spatial Analysis Using R
You may pick any of the R courses from their catalog page:
Once you become more fluent in writing R syntax (and consequently addicted to R), you will want to unlock more of its power (read: do some really nifty stuff). In that case make sure to check out RCPP, an R package that makes it easier for integrating C++ code with R, or RevoScaleR (start the free tutorial).
After spending some time writing R code (and you became an R-addict), you’ll reach a point that you want to start writing your own R package. Hilary Parker from Etsy has written a short tutorial on how to create your first package, and if you’re really serious about it you need to read R packages, an upcoming book by Hadley Wickham that is already available for free on the web.
If you want to start learning on the inner workings of R and improve your understanding of it, the best way to get you started is by reading Advanced R.
Finally, come visit us again at R-bloggers.com to read of the latest news and tutorials from bloggers of the R community.