Creating PDF Documents with R/RStudio

For something I’m currently working on I wanted to create some PDF reports so that I can share the results with others via email. Turned out creating PDFs to output the values from R dataframes is not so straightforward after all.

If it’s a ggplot type graphic, easiest is probably to do something like:


which produces the following PDF:


Doesn’t look too bad since you can still see the figure. However, if it’s a dataframe, it’s not so easy unless it’s a fairly short table.


which produces the following cropped-out single page PDF:


Still there’s a workaround – you can specify the maximum number of rows per page to force multiple pages as below:


This creates a decent looking multi-page PDF:


This may be sufficient for most purposes, but I wanted to have a bit more control over the layout and stuff, so this wasn’t going to do (although I kept pushing the above code to its limit since I kept postponing looking at anything more complex!).

Seems like knitr is the way to go – I picked up bits and pieces of the puzzle from various sources, but what I also realized was that there was no “getting started” type tutorial to help a newbie get started with the whole process. This is why I thought I’d put together this post which should give anyone a decent quick start.

Knitr “was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more)”.

Sweave, with a .Rnw extension, was typically used to:
– allow for R code to be embedded in Latex
– produce PDF and HTML files
– run the R code each time

On the other hand, knitr was designed to allow any input language (e.g. R, Python and Awk) and any output markup language (e.g. LaTeX and HTML). With a .Rnw extension you can create .tex/.pdf files and with a .Rmd file you can create HTML.

RStudio supports knitr – see to setup the environment with the right settings. Once that’s done, you can create a simple .Rnw file by doing:


This produces a file with a basic template:


Make sure you have Tex installed; otherwise RStudio will complain saying “No TeX installation detected”. For Windows, MiKTex would be the way to do.

Now you can type some basic Latex code to see if it works.


Clicking on ‘Compile PDF’ should now produce a PDF document that looks like:


If the PDF creation was successful, that means you have the environment all setup for the more interesting stuff. It’s possible that the first time RStudio will prompt to allow installation of missing packages. Just click ‘Yes’ and install whatever that’s needed.

Now to the real task – first I created a simple 100 x 4 matrix in a file called Main.R.


Obviously nothing fancy, but the purpose is to have a dataframe that will run into at least 2 pages.


Now you can modify the .Rnw file to say “run Make.R script and print dataframe, df” – note that this is a slightly advanced version of including the R code directly in the .Rnw file. In this case you can easily include all the R code from Main.R directly in the .Rnw file (remove ‘external-code’ option and replace ‘source(‘Main.R’)’ with the actual R code), but I prefer to have my R code separate since often I would want to run just the R code without creating any PDFs. Also, this sort of keeps the ‘analysis’ and the ‘publishing’ aspects separate.


Make sure you also have xtable package installed for the above to work. The xtable Gallery contains all the details about this package and its commands – it’s basically a package that produces LaTeX-formatted tables. Running the above code (as in, clicking ‘Compile PDF’) produces:


Still not quite what we want since we see only 1 page. This is where the longtable package comes to the rescue:


This would (finally!) produce what we need!


Finally, just for the sake of completeness, you can also include all sorts of plots in the PDF document as well. I modified the Main.R code to include a basic plot:


and also prettified the table so that the header is repeated on all pages, there’s a line at the bottom of each page and the header has some formatting. 

This is what the final code version looks like:


and here’s the output:

1st page:image

2nd page:image

3rd page:image

Hope this helps!

Posted in General | Tagged , , , , , , , , , , , , | Leave a comment

Launching Spyder on Windows

If you install Spyder using any of the standalone installers after installing Python, you’ll have trouble launching the Spyder IDE. If you go to python_dir/Scripts, you’ll see the following:


but clicking on spyder.dat will launch a command prompt for a split second and then disappear. This seems to be a common issues as asked here and here (and other places).

What I did was taking a quick screenshot of the screen that appears for a sec. This is what it actually says:


It was pretty easy to fix the issue once I knew what the problem was. It was a matter of opening up a command prompt and typing:

pip install -U PySide

Once that was done, I created a new shortcut on the desktop and pointed it to spyder.bat (right click on Desktop –> New –> Shortcut, then browse browse to bat file) . Now Spyder should launch without any issues🙂


Posted in General | Tagged , , , | Leave a comment

Using Visual Studio 2013 for Python (3.4) with NumPy and SciPy on Windows

There seem to be various editors for Python and there are many articles online (e.g., this blog post) that discuss the features of the various editors. PyCharm by JetBrains seems pretty popular, but while I was Googling for Python editors, I came across Python Tools for Visual Studio. Coming from a C# background, I thought I’d give it a shot before trying out a totally new editor (I’ve moved onto Spyder now though).

The first thing you need to do is download PTVS from CodePlex. I downloaded PTVS 2.1 VS 2013.msi since I’m on VS2013. Of course you’ll need to install Python first if you haven’t done so already – I installed 3.4 (64-bit initially – but had to revert to 32-bit later).

At this point you should be able to create a Python project in Visual Studio – here is a good tutorial on how to create your first Python program in VS. Basically you create a new Python project, very similar to how you would create a .net application.


Creating a new project creates a new .py file with one line of code:


Now you hit F5, it runs your Python code:


This is the easy part. What I was having trouble was figuring out how to add external libraries and import these. Following is the simple Python code I was trying to run (from the Udacity Machine Learning class):

import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) Y = np.array([1, 1, 1, 2, 2, 2]) from sklearn.naive_bayes import GaussianNB clf = GaussianNB(), Y) print(clf.predict([[-0.8, -1]]))

This is when I started having trouble. VS kept complaining about “No module named ‘numpy’.” and kept stopping at the import statement.

If you google ‘visual studio no module named numpy’ or ‘no module named numpy’ you’ll find tons of threads with various suggestions, including installing/upgrading pip (pip comes with 3.4), using easy_install and running other registry updates. I couldn’t get any of these to work. With pip, it gave me false hope looking as if everything was fine:


but then stopped out with errors (more on installing using command line later).

After lots of searching, I decided to try the numpy Windows installer available at There’s only 32-bit version available on this official site, so I just tried installing that, but ended up with the following error saying python was not found in the registry:


Here’s what finally worked –

I uninstalled my 64-bit Python and reinstalled Python 32-bit.

Then tried installing numpy – which worked!


Similarly, I could install other modules too. sklearn installed like a charm:


The previous code requires scipy, so I installed scipy too (from


Now my Python code runs like a charm in Visual Studio!🙂


So just to recap, these are the steps to follow:

  1. Install Python 32-bit. Make sure ‘Add python.exe to path’ is enabled.
  2. Install Python Tools for Visual Studio.
  3. Install numpy 32-bit and any other external modules you need.
  4. Run you code!

Seems pretty straight-forward, but lots of people, including myself, seem to have trouble getting Visual Studio to work with Python, especially getting the external modules to work.

By the way, there are ‘unofficial’ versions of numpy available in 64-bit, and also Windows versions of Python available (like IronPython), but I haven’t really played around with these. I’m sure some of these combinations would work equally well.

Lastly, it seems pretty straightforward to install Python modules using pip. Essentially there are (at least?) 3 ways –

– if there’s a zip file (or a tar.gz), simply download and unzip into a folder (it should contain a file called – see the YouTube video here. Then go to that folder in a command prompt and do:
python install

– if you downloaded a .whl file, just open a command prompt and type:
pip install some-package.whl

– install directly using pip (no need to change directory or anything, just open a command window):
pip install –U packageName (e.g., PySide)

You will need to use one of the above two methods to install packages that don’t have the Windows installers. For instance, matplotlib had an installer, but it’s got dependency on six which doesn’t have an installer. Matplotlib also requires dateutil and pyparsing, and  was a great resource to download these modules from.

Posted in General | Tagged , , , , , , | 3 Comments

Charting with WPF/C#

Visualization techniques have been evolving rapidly and with a growing trend towards ‘big data’ and ‘analytics’ there are so many technologies to choose from. If the development technology is not a constraint, D3 is arguably the best way to go, but if you are looking for some charting capabilities within a WPF/C# development environment your free choices are somewhat limited. Based on what I have tried out so far and read in various forums, here are some popular ones:

  1. MS Chart Controls – Pretty decent collection of controls, but you need to use ‘old’ Windows Forms to use these. You can host the Form within a WPF application if you really want.
  2. WPF Toolkit Charting Controls – I think this is the first real WPF charting control Microsoft released. Charting options are good, but they still look like the old Forms-based charts. Here is a good tutorial to get started.
  3. D3 1.0 – Pretty easy to get started, but I felt the number of options available was rather limited. First attempt to have D3 library capabilities in a WPF environment I think. There is a good tutorial here on how to get started with these charts. Seems like the WPF effort has somewhat been abandoned and there is more emphasis on having the D3 capabilities in Silverlight now which is called D3 2.0.
  4. Metro Charts – “This project provides a small library to display charts in Modern UI Style (formerly known as Metro) in WPF, Silverlight and Windows 8 applications”. This is the best I have seen so far and the UI looks much better and modern compared to all others. The trouble is, these are aimed towards Windows 8 applications and you need Visual Studio 2012 to run the sample code you get off the site. You still can get these to run on Windows 7 with Visual Studio 2010, and that’s what I’m going to focus on in this post.

– To get started, download the sample code from (if you don’t want to go through the steps yourself, just download the project

which contains all the changes I’m discussing here and you should be good to go! Just rename to a .zip – WordPress doesn’t allow me to upload zip files). You should have the following folders and files:


– Open the solution file. Permanently remove source control since we won’t be needing this.


– At this point you’ll get a bunch of errors starting from:


and your Solution Explorer will like this:


– Delete all projects except for the following three:


– Save the workspace and close Visual Studio

– Now, open Windows explorer and open the .csproj file in a text editor (like NotePad++) and change the ‘TargetFrameworkVersion’ from 4.5 to 4.0. You need to do this for the 3 files under De.TorstenMandelkow.MetroChart, De.TorstenMandelkow.MetroChart.WPF and TestApplicationWPF folders.


– If you want, you can clean up your folder a bit as well so that you have only the following on disk. This step is optional.


– Open the main solution (i.e., MetroChart.sln). Remove source control association bindings if a message pops up. You should see the following where the source code is now loaded into VS.


– Change startup project to TestApplicationWPF


– Hit F5, and you should see the following Smile


One of the things I wanted to do was find the underlying data when a user double clicks on a chart. To do this, you can add something like this in your xaml:


Then in your code behind, you can handle this event in whatever the way you want.


Play around. Using Metro Charts is fairly straight forward.

Posted in General | Tagged , , , , , , , , , | 9 Comments

Accessing NCBO Annotator Web Service in C#

The NCBO Annotator allows you to get annotations for (biomedical) text from a number of standard ontologies. For instance, if I want to find the corresponding RadLex codes for ‘abdomen knee’, I can type them into the textbox, and restrict the ontology list to just RadLex.


This manual approach works fine if you just want a few sentences annotated, but if you need to annotate multiple sentences in a systematic manner you need to do this programmatically. This is where the annotate web services come in. There are several client examples as well where you can find some sample code in several languages. There is a Java annotator client example, but unfortunately there is no C# example. For something I’m trying to implement I need to (or rather, I prefer to) use C#. Translating some Java code to C# isn’t hard, but in case someone’s looking for a ready-to-use C# example, following is a function/method you can use:

private void GetData()
            Uri address = new Uri("");

            // Create the web request  
            HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;

            // Set type to POST  
            request.Method = "POST";
            request.ContentType = "application/x-www-form-urlencoded";
            request.UserAgent = "Annotator Client Example - Annotator";

            String text = "abdomen knee";

            StringBuilder data = new StringBuilder();

            // Configure the form parameters
            data.Append("longestOnly=" + "false&");
            data.Append("wholeWordOnly=" + "true&");
            data.Append("filterNumber=" + "true&");
            data.Append("withDefaultStopWords=" + "true&");
            data.Append("isTopWordsCaseSensitive=" + "false&");
            data.Append("mintermSize=" + "3&");
            data.Append("scored=" + "true&");
            data.Append("withSynonyms=" + "true&");
            data.Append("ontologiesToExpand=" + "&");
            data.Append("ontologiesToKeepInResult=" + "1057&"); // from Use comma sep. list of ont. ids
            data.Append("isVirtualOntologyId=" + "true&");
            data.Append("semanticTypes=" + "&");
            data.Append("levelMax=" + "0&");
            data.Append("mappingTypes=" + "&"); //null, Automatic, Manual 
            data.Append("textToAnnotate=" + text + "&");
            data.Append("format=" + "xml&"); //Options are 'text', 'xml', 'tabDelimited'   
            data.Append("apikey=" + "YOUR_KEY");

            // Create a byte array of the data we want to send  
            byte[] byteData = UTF8Encoding.UTF8.GetBytes(data.ToString());

            // Set the content length in the request headers  
            request.ContentLength = byteData.Length;

            // Write data  
            using (Stream postStream = request.GetRequestStream())
                postStream.Write(byteData, 0, byteData.Length);

            // Get response  
            using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)

                XmlDocument doc = new XmlDocument();

                XmlNodeList nodes = doc.SelectNodes("/success/data/annotatorResultBean/annotations/annotationBean");
                foreach (XmlNode node in nodes)
                    string radlexId = node.SelectSingleNode("concept/localConceptId").InnerText.Split('/')[1];
                    string radlexDescription = node.SelectSingleNode("concept/preferredName").InnerText;

                // Or print to console [uncomment line: doc.Load(response.GetResponseStream());]
                //StreamReader reader = new StreamReader(response.GetResponseStream());

You can set breakpoints at radlexID/radlexDescription to see that what you get programmatically is exactly the same as what you get when you type the text directly into website. Per the comment in the code, pick the list of ontologies you are interested in from and pass it as a comma separated list (e.g., if I’m interested in RadLex and Snomed, I’ll use:
“data.Append(“ontologiesToKeepInResult=” + “1057,1353&”);
where 1353 is the ID for Snomed.

Before you can use the web service, you’ll need to sign up at and get an API key first (which you will then pass as the value for ‘apikey’).

Posted in General | Tagged , , , , , | 1 Comment