For something I’m currently working on I wanted to create some PDF reports so that I can share the results with others via email. Turned out creating PDFs to output the values from R dataframes is not so straightforward after all.
If it’s a ggplot type graphic, easiest is probably to do something like:
which produces the following PDF:
Doesn’t look too bad since you can still see the figure. However, if it’s a dataframe, it’s not so easy unless it’s a fairly short table.
which produces the following cropped-out single page PDF:
Still there’s a workaround – you can specify the maximum number of rows per page to force multiple pages as below:
This creates a decent looking multi-page PDF:
This may be sufficient for most purposes, but I wanted to have a bit more control over the layout and stuff, so this wasn’t going to do (although I kept pushing the above code to its limit since I kept postponing looking at anything more complex!).
Seems like knitr is the way to go – I picked up bits and pieces of the puzzle from various sources, but what I also realized was that there was no “getting started” type tutorial to help a newbie get started with the whole process. This is why I thought I’d put together this post which should give anyone a decent quick start.
Knitr “was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver +
highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more)”.
Sweave, with a .Rnw extension, was typically used to:
– allow for R code to be embedded in Latex
– produce PDF and HTML files
– run the R code each time
On the other hand, knitr was designed to allow any input language (e.g. R, Python and Awk) and any output markup language (e.g. LaTeX and HTML). With a .Rnw extension you can create .tex/.pdf files and with a .Rmd file you can create HTML.
RStudio supports knitr – see http://yihui.name/knitr/demo/rstudio/ to setup the environment with the right settings. Once that’s done, you can create a simple .Rnw file by doing:
This produces a file with a basic template:
Make sure you have Tex installed; otherwise RStudio will complain saying “No TeX installation detected”. For Windows, MiKTex would be the way to do.
Now you can type some basic Latex code to see if it works.
Clicking on ‘Compile PDF’ should now produce a PDF document that looks like:
If the PDF creation was successful, that means you have the environment all setup for the more interesting stuff. It’s possible that the first time RStudio will prompt to allow installation of missing packages. Just click ‘Yes’ and install whatever that’s needed.
Now to the real task – first I created a simple 100 x 4 matrix in a file called Main.R.
Obviously nothing fancy, but the purpose is to have a dataframe that will run into at least 2 pages.
Now you can modify the .Rnw file to say “run Make.R script and print dataframe, df” – note that this is a slightly advanced version of including the R code directly in the .Rnw file. In this case you can easily include all the R code from Main.R directly in the .Rnw file (remove ‘external-code’ option and replace ‘source(‘Main.R’)’ with the actual R code), but I prefer to have my R code separate since often I would want to run just the R code without creating any PDFs. Also, this sort of keeps the ‘analysis’ and the ‘publishing’ aspects separate.
Make sure you also have xtable package installed for the above to work. The xtable Gallery contains all the details about this package and its commands – it’s basically a package that produces LaTeX-formatted tables. Running the above code (as in, clicking ‘Compile PDF’) produces:
Still not quite what we want since we see only 1 page. This is where the longtable package comes to the rescue:
This would (finally!) produce what we need!
Finally, just for the sake of completeness, you can also include all sorts of plots in the PDF document as well. I modified the Main.R code to include a basic plot:
and also prettified the table so that the header is repeated on all pages, there’s a line at the bottom of each page and the header has some formatting.
This is what the final code version looks like:
and here’s the output:
Hope this helps!