Working with date fields in R and Power BI

I’m starting to become a big fan of Power BI and the monthly releases are certainly helping the tool get closer to prime time (I think Power BI has come a long way, but still has some catching up to do).  I love the Power BI-R integration which significantly enhances the charting capabilities. I’m yet to find out how to plot multiple series on the same axis without having to do all the reshaping of columns-to-rows. One would think these sort of charts should be supported out of the box – I’m not referring to plotting different values of a single column as series, rather, plotting the counts for two separate columns (e.g., 2 lines showing number of patients having hypertension as 1 line and #patients with diabetes as another over months – input data has columns: ID, hypertension yes/no, diabetes yes/no, date of visit).

I wanted to give Power BI-R integration a try and this is some sample code I was using in R which simply extracts month and year from a date field:

image

image

If we omit specifying the format of the raw data and try using date field directly, we may end up with the wrong month:

image

image

I pasted the code (with the format specified) into Power BI. In my dataset I have a date type column called CompletedDTTM and a column called Accession which is a unique identifier. CompletedDTTM was specified to be a ‘Date’ field and formatted to look like:

image

When I tried to plot the counts over months, I ended up with this:

image

Not quite the chart I had in mind (I have ignored grouping by year here). All the R code is trying to do is determine distinct accession counts by month.

It was a bit hard to debug since the code was running fine in RStudio but in Power BI I’m getting NAs. A few times I also ended up with the following while trying things out:

image

Above is not very helpful. You can click on ‘See details’ and try debugging, but it’s not the easiest. An easier approach is to print the dataframe values to a table to see what’s going on:

image

Now that we can see what’s going on, the next step is to fix the month extraction piece.

After playing around a bit, I managed to put the following (DAX) syntax together:

image

So I’m guessing that what’s getting passed to the underlying R script within the Power BI environment is not the same format the raw data file has, but rather, the format we have selected within Power BI after importing data.

This also makes the syntax a lot easier! We don’t have to worry about 2 and 4-digit years either since all of them are formatted the same way once we select the date display format🙂

Finally, now that we have the right values loaded into R, all we have to do is plot using ggplot instead of showing the table.

image

So it seems like you can’t just copy-paste code from standard R environment to Power BI directly, at least when it comes to working with dates.

I haven’t shown how to plot the 2 series on the same x-axis, but this should be pretty straightforward with ggplot.

Posted in General | Tagged , , , , | Leave a comment

Creating PDF Documents with R/RStudio

For something I’m currently working on I wanted to create some PDF reports so that I can share the results with others via email. Turned out creating PDFs to output the values from R dataframes is not so straightforward after all.

If it’s a ggplot type graphic, easiest is probably to do something like:

image

which produces the following PDF:

image

Doesn’t look too bad since you can still see the figure. However, if it’s a dataframe, it’s not so easy unless it’s a fairly short table.

image

which produces the following cropped-out single page PDF:

image

Still there’s a workaround – you can specify the maximum number of rows per page to force multiple pages as below:

image

This creates a decent looking multi-page PDF:

image

This may be sufficient for most purposes, but I wanted to have a bit more control over the layout and stuff, so this wasn’t going to do (although I kept pushing the above code to its limit since I kept postponing looking at anything more complex!).

Seems like knitr is the way to go – I picked up bits and pieces of the puzzle from various sources, but what I also realized was that there was no “getting started” type tutorial to help a newbie get started with the whole process. This is why I thought I’d put together this post which should give anyone a decent quick start.

Knitr “was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more)”.

Sweave, with a .Rnw extension, was typically used to:
– allow for R code to be embedded in Latex
– produce PDF and HTML files
– run the R code each time

On the other hand, knitr was designed to allow any input language (e.g. R, Python and Awk) and any output markup language (e.g. LaTeX and HTML). With a .Rnw extension you can create .tex/.pdf files and with a .Rmd file you can create HTML.

RStudio supports knitr – see http://yihui.name/knitr/demo/rstudio/ to setup the environment with the right settings. Once that’s done, you can create a simple .Rnw file by doing:

image

This produces a file with a basic template:

image

Make sure you have Tex installed; otherwise RStudio will complain saying “No TeX installation detected”. For Windows, MiKTex would be the way to do.

Now you can type some basic Latex code to see if it works.

image

Clicking on ‘Compile PDF’ should now produce a PDF document that looks like:

image

If the PDF creation was successful, that means you have the environment all setup for the more interesting stuff. It’s possible that the first time RStudio will prompt to allow installation of missing packages. Just click ‘Yes’ and install whatever that’s needed.

Now to the real task – first I created a simple 100 x 4 matrix in a file called Main.R.

image

Obviously nothing fancy, but the purpose is to have a dataframe that will run into at least 2 pages.

image

Now you can modify the .Rnw file to say “run Make.R script and print dataframe, df” – note that this is a slightly advanced version of including the R code directly in the .Rnw file. In this case you can easily include all the R code from Main.R directly in the .Rnw file (remove ‘external-code’ option and replace ‘source(‘Main.R’)’ with the actual R code), but I prefer to have my R code separate since often I would want to run just the R code without creating any PDFs. Also, this sort of keeps the ‘analysis’ and the ‘publishing’ aspects separate.

image

Make sure you also have xtable package installed for the above to work. The xtable Gallery contains all the details about this package and its commands – it’s basically a package that produces LaTeX-formatted tables. Running the above code (as in, clicking ‘Compile PDF’) produces:

image

Still not quite what we want since we see only 1 page. This is where the longtable package comes to the rescue:

image

This would (finally!) produce what we need!

image

Finally, just for the sake of completeness, you can also include all sorts of plots in the PDF document as well. I modified the Main.R code to include a basic plot:

image

and also prettified the table so that the header is repeated on all pages, there’s a line at the bottom of each page and the header has some formatting. 

This is what the final code version looks like:

image

and here’s the output:

1st page:image

2nd page:image

3rd page:image

Hope this helps!

Posted in General | Tagged , , , , , , , , , , , , | Leave a comment

Launching Spyder on Windows

If you install Spyder using any of the standalone installers after installing Python, you’ll have trouble launching the Spyder IDE. If you go to python_dir/Scripts, you’ll see the following:

image

but clicking on spyder.dat will launch a command prompt for a split second and then disappear. This seems to be a common issues as asked here and here (and other places).

What I did was taking a quick screenshot of the screen that appears for a sec. This is what it actually says:

image

It was pretty easy to fix the issue once I knew what the problem was. It was a matter of opening up a command prompt and typing:

pip install -U PySide

Once that was done, I created a new shortcut on the desktop and pointed it to spyder.bat (right click on Desktop –> New –> Shortcut, then browse browse to bat file) . Now Spyder should launch without any issues🙂

image

Posted in General | Tagged , , , | Leave a comment

Using Visual Studio 2013 for Python (3.4) with NumPy and SciPy on Windows

There seem to be various editors for Python and there are many articles online (e.g., this blog post) that discuss the features of the various editors. PyCharm by JetBrains seems pretty popular, but while I was Googling for Python editors, I came across Python Tools for Visual Studio. Coming from a C# background, I thought I’d give it a shot before trying out a totally new editor (I’ve moved onto Spyder now though).

The first thing you need to do is download PTVS from CodePlex. I downloaded PTVS 2.1 VS 2013.msi since I’m on VS2013. Of course you’ll need to install Python first if you haven’t done so already – I installed 3.4 (64-bit initially – but had to revert to 32-bit later).

At this point you should be able to create a Python project in Visual Studio – here is a good tutorial on how to create your first Python program in VS. Basically you create a new Python project, very similar to how you would create a .net application.

image

Creating a new project creates a new .py file with one line of code:

image

Now you hit F5, it runs your Python code:

image

This is the easy part. What I was having trouble was figuring out how to add external libraries and import these. Following is the simple Python code I was trying to run (from the Udacity Machine Learning class):

import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) Y = np.array([1, 1, 1, 2, 2, 2]) from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X, Y) print(clf.predict([[-0.8, -1]]))

This is when I started having trouble. VS kept complaining about “No module named ‘numpy’.” and kept stopping at the import statement.

If you google ‘visual studio no module named numpy’ or ‘no module named numpy’ you’ll find tons of threads with various suggestions, including installing/upgrading pip (pip comes with 3.4), using easy_install and running other registry updates. I couldn’t get any of these to work. With pip, it gave me false hope looking as if everything was fine:

image

but then stopped out with errors (more on installing using command line later).

After lots of searching, I decided to try the numpy Windows installer available at http://www.scipy.org/scipylib/download.html. There’s only 32-bit version available on this official site, so I just tried installing that, but ended up with the following error saying python was not found in the registry:

image

Here’s what finally worked –

I uninstalled my 64-bit Python and reinstalled Python 32-bit.

Then tried installing numpy – which worked!

image

Similarly, I could install other modules too. sklearn installed like a charm:

image

The previous code requires scipy, so I installed scipy too (from http://www.scipy.org/scipylib/download.html):

image

Now my Python code runs like a charm in Visual Studio!🙂

image

So just to recap, these are the steps to follow:

  1. Install Python 32-bit. Make sure ‘Add python.exe to path’ is enabled.
  2. Install Python Tools for Visual Studio.
  3. Install numpy 32-bit and any other external modules you need.
  4. Run you code!

Seems pretty straight-forward, but lots of people, including myself, seem to have trouble getting Visual Studio to work with Python, especially getting the external modules to work.

By the way, there are ‘unofficial’ versions of numpy available in 64-bit, and also Windows versions of Python available (like IronPython), but I haven’t really played around with these. I’m sure some of these combinations would work equally well.

Lastly, it seems pretty straightforward to install Python modules using pip. Essentially there are (at least?) 3 ways –

– if there’s a zip file (or a tar.gz), simply download and unzip into a folder (it should contain a file called setup.py) – see the YouTube video here. Then go to that folder in a command prompt and do:
python setup.py install

– if you downloaded a .whl file, just open a command prompt and type:
pip install some-package.whl

– install directly using pip (no need to change directory or anything, just open a command window):
pip install –U packageName (e.g., PySide)

You will need to use one of the above two methods to install packages that don’t have the Windows installers. For instance, matplotlib had an installer, but it’s got dependency on six which doesn’t have an installer. Matplotlib also requires dateutil and pyparsing, and http://www.lfd.uci.edu/~gohlke/pythonlibs  was a great resource to download these modules from.

Posted in General | Tagged , , , , , , | 3 Comments

Charting with WPF/C#

Visualization techniques have been evolving rapidly and with a growing trend towards ‘big data’ and ‘analytics’ there are so many technologies to choose from. If the development technology is not a constraint, D3 is arguably the best way to go, but if you are looking for some charting capabilities within a WPF/C# development environment your free choices are somewhat limited. Based on what I have tried out so far and read in various forums, here are some popular ones:

  1. MS Chart Controls – Pretty decent collection of controls, but you need to use ‘old’ Windows Forms to use these. You can host the Form within a WPF application if you really want.
  2. WPF Toolkit Charting Controls – I think this is the first real WPF charting control Microsoft released. Charting options are good, but they still look like the old Forms-based charts. Here is a good tutorial to get started.
  3. D3 1.0 – Pretty easy to get started, but I felt the number of options available was rather limited. First attempt to have D3 library capabilities in a WPF environment I think. There is a good tutorial here on how to get started with these charts. Seems like the WPF effort has somewhat been abandoned and there is more emphasis on having the D3 capabilities in Silverlight now which is called D3 2.0.
  4. Metro Charts – “This project provides a small library to display charts in Modern UI Style (formerly known as Metro) in WPF, Silverlight and Windows 8 applications”. This is the best I have seen so far and the UI looks much better and modern compared to all others. The trouble is, these are aimed towards Windows 8 applications and you need Visual Studio 2012 to run the sample code you get off the site. You still can get these to run on Windows 7 with Visual Studio 2010, and that’s what I’m going to focus on in this post.

– To get started, download the sample code from http://modernuicharts.codeplex.com/releases/view/106190 (if you don’t want to go through the steps yourself, just download the project

which contains all the changes I’m discussing here and you should be good to go! Just rename to a .zip – WordPress doesn’t allow me to upload zip files). You should have the following folders and files:

image

– Open the solution file. Permanently remove source control since we won’t be needing this.

image

– At this point you’ll get a bunch of errors starting from:

image

and your Solution Explorer will like this:

image

– Delete all projects except for the following three:

image

– Save the workspace and close Visual Studio

– Now, open Windows explorer and open the .csproj file in a text editor (like NotePad++) and change the ‘TargetFrameworkVersion’ from 4.5 to 4.0. You need to do this for the 3 files under De.TorstenMandelkow.MetroChart, De.TorstenMandelkow.MetroChart.WPF and TestApplicationWPF folders.

image

– If you want, you can clean up your folder a bit as well so that you have only the following on disk. This step is optional.

image

– Open the main solution (i.e., MetroChart.sln). Remove source control association bindings if a message pops up. You should see the following where the source code is now loaded into VS.

image

– Change startup project to TestApplicationWPF

image

– Hit F5, and you should see the following Smile

image

One of the things I wanted to do was find the underlying data when a user double clicks on a chart. To do this, you can add something like this in your xaml:

image

Then in your code behind, you can handle this event in whatever the way you want.

image

Play around. Using Metro Charts is fairly straight forward.

Posted in General | Tagged , , , , , , , , , | 9 Comments