Creating PDF Documents with R/RStudio


For something I’m currently working on I wanted to create some PDF reports so that I can share the results with others via email. Turned out creating PDFs to output the values from R dataframes is not so straightforward after all.

If it’s a ggplot type graphic, easiest is probably to do something like:

image

which produces the following PDF:

image

Doesn’t look too bad since you can still see the figure. However, if it’s a dataframe, it’s not so easy unless it’s a fairly short table.

image

which produces the following cropped-out single page PDF:

image

Still there’s a workaround – you can specify the maximum number of rows per page to force multiple pages as below:

image

This creates a decent looking multi-page PDF:

image

This may be sufficient for most purposes, but I wanted to have a bit more control over the layout and stuff, so this wasn’t going to do (although I kept pushing the above code to its limit since I kept postponing looking at anything more complex!).

Seems like knitr is the way to go – I picked up bits and pieces of the puzzle from various sources, but what I also realized was that there was no “getting started” type tutorial to help a newbie get started with the whole process. This is why I thought I’d put together this post which should give anyone a decent quick start.

Knitr “was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more)”.

Sweave, with a .Rnw extension, was typically used to:
– allow for R code to be embedded in Latex
– produce PDF and HTML files
– run the R code each time

On the other hand, knitr was designed to allow any input language (e.g. R, Python and Awk) and any output markup language (e.g. LaTeX and HTML). With a .Rnw extension you can create .tex/.pdf files and with a .Rmd file you can create HTML.

RStudio supports knitr – see http://yihui.name/knitr/demo/rstudio/ to setup the environment with the right settings. Once that’s done, you can create a simple .Rnw file by doing:

image

This produces a file with a basic template:

image

Make sure you have Tex installed; otherwise RStudio will complain saying “No TeX installation detected”. For Windows, MiKTex would be the way to do.

Now you can type some basic Latex code to see if it works.

image

Clicking on ‘Compile PDF’ should now produce a PDF document that looks like:

image

If the PDF creation was successful, that means you have the environment all setup for the more interesting stuff. It’s possible that the first time RStudio will prompt to allow installation of missing packages. Just click ‘Yes’ and install whatever that’s needed.

Now to the real task – first I created a simple 100 x 4 matrix in a file called Main.R.

image

Obviously nothing fancy, but the purpose is to have a dataframe that will run into at least 2 pages.

image

Now you can modify the .Rnw file to say “run Make.R script and print dataframe, df” – note that this is a slightly advanced version of including the R code directly in the .Rnw file. In this case you can easily include all the R code from Main.R directly in the .Rnw file (remove ‘external-code’ option and replace ‘source(‘Main.R’)’ with the actual R code), but I prefer to have my R code separate since often I would want to run just the R code without creating any PDFs. Also, this sort of keeps the ‘analysis’ and the ‘publishing’ aspects separate.

image

Make sure you also have xtable package installed for the above to work. The xtable Gallery contains all the details about this package and its commands – it’s basically a package that produces LaTeX-formatted tables. Running the above code (as in, clicking ‘Compile PDF’) produces:

image

Still not quite what we want since we see only 1 page. This is where the longtable package comes to the rescue:

image

This would (finally!) produce what we need!

image

Finally, just for the sake of completeness, you can also include all sorts of plots in the PDF document as well. I modified the Main.R code to include a basic plot:

image

and also prettified the table so that the header is repeated on all pages, there’s a line at the bottom of each page and the header has some formatting. 

This is what the final code version looks like:

image

and here’s the output:

1st page:image

2nd page:image

3rd page:image

Hope this helps!

Advertisements
This entry was posted in General and tagged , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s