Reproducible research on coursera: Week 2 introduces knitr and R Markdown

I’m doing the free Coursera course on reproducibility by Johns Hopkins University to improve my own teaching. Week 2 introduces knitr and R Markdown, two core tools to create reproducible research.

While week 1 defined what reproducibility is, week 2 shows how to do it.

Readable coding standards in R

In the first video, Roger Peng from Johns Hopkins University discussed basic standards for readable coding in R. Even if you do not want to use knitr or R Markdown later, and only provide the R code for replication of your work, clean code is essential.

  • Write your code in a text editor and save it as an R script so that you can provide this file to others.
  • Indent your code so that the reader can easily navigate the functions.
  • Write short functions that have a specific purpose so that others can understand what you did. With short functions you can also locate coding errors much faster.

These points sound simplified, but I know from my own experience that they can be easily forgotten when you’re in the middle of writing code. When I go back to my own, ‘old’ R code even after just a few weeks, I’m often annoyed that I can’t understand what I did. I should print out these points and tape them above my desk.

Markdown

Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML). (John Gruper, developer of Markdown)

I was a bit scared of learning the Markdown formatting language. However, after watching the video on Markdown syntax I was surprised how easy it is. You basically write text and format it (strong / italics etc), but without layout buttons like in word documents. The rules are similar to writing html code or LaTex. For example,

**this text will appear bold**

*this text will appear italicized*

[this links to a great webpage] (http://greatwebpage.com)

These two dashes introduce a new line //

Peng recommends to look at the official markdown webpage, or the GitHub manual.

R Markdown

R Markdown is just one tiny step further. You use markdown but you add code chunks in between the text. This means you can create ‘live’ code, and when you compile the file, it runs the R code and prints the results. Peng calls R Markdown a “core tool in literate statistical programming.” It would be best to watch the video on R Markdown and the demonstration on coursera (ca. 6 min).

Essentially, you write text in an editor (e.g. R studio script; there you can create a new document called “R Markdown”) and add code chunks:

```{r}
library(dataset)
data(airquality)
summary(airquality)
pairs{airquality}
```

Knitr

Once you created all the text and code in the R Markdown language, the workflow is like this: You use the knitr package to convert it to a standard markdown document. Then that markdown document is converted into a nicely readable html file. You don’t need to remember this: R Studio does all of this in one go if you press the button “Knit HTML”. Peng also shows how to embed results tables and figures, and how to show or not show the code and results in the output html file (sometimes you just want the nice plot to show, but the R code should remain in Markdown for reproducibility). Just watch the tutorial on coursera.

R Markdown Knitr in R Studio

One of the disadvantages is that you have to re-run all code when you make changes and want to compile the file again – and this can take time. Peng recommends the ‘cache’ option which store the previous results. This is useful when you just make changes in the text part, but not the R code or computations.

Tools

If you haven’t worked with R, R Studio or GitHub before, and you want to do the peer reviewed assignments, you can sign up for the coursera course “The Data Scientist’s Toolbox,” also by Johns Hopkins University. There you will find videos to get set up with the software.

My teaching

What I liked about the videos, similar to last week, was that they were short and showed only the basics. This makes it easier to get into it, and to follow the steps on the computer. For those who are a bit scared of new statistical tools and put off trying them, it’s perfect. I’m thinking of ways to set up a session on these tools either within the Intro to R class, or the Replication Workshop, at Cambridge. If that doesn’t work, I would probably recommend that my students do the coursera course and ask my TAs to help with the assignments if need be.

Advertisements
Tagged , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: