Working with date fields in R and Power BI


I’m starting to become a big fan of Power BI and the monthly releases are certainly helping the tool get closer to prime time (I think Power BI has come a long way, but still has some catching up to do).  I love the Power BI-R integration which significantly enhances the charting capabilities. I’m yet to find out how to plot multiple series on the same axis without having to do all the reshaping of columns-to-rows. One would think these sort of charts should be supported out of the box – I’m not referring to plotting different values of a single column as series, rather, plotting the counts for two separate columns (e.g., 2 lines showing number of patients having hypertension as 1 line and #patients with diabetes as another over months – input data has columns: ID, hypertension yes/no, diabetes yes/no, date of visit).

I wanted to give Power BI-R integration a try and this is some sample code I was using in R which simply extracts month and year from a date field:

image

image

If we omit specifying the format of the raw data and try using date field directly, we may end up with the wrong month:

image

image

I pasted the code (with the format specified) into Power BI. In my dataset I have a date type column called CompletedDTTM and a column called Accession which is a unique identifier. CompletedDTTM was specified to be a ‘Date’ field and formatted to look like:

image

When I tried to plot the counts over months, I ended up with this:

image

Not quite the chart I had in mind (I have ignored grouping by year here). All the R code is trying to do is determine distinct accession counts by month.

It was a bit hard to debug since the code was running fine in RStudio but in Power BI I’m getting NAs. A few times I also ended up with the following while trying things out:

image

Above is not very helpful. You can click on ‘See details’ and try debugging, but it’s not the easiest. An easier approach is to print the dataframe values to a table to see what’s going on:

image

Now that we can see what’s going on, the next step is to fix the month extraction piece.

After playing around a bit, I managed to put the following (DAX) syntax together:

image

So I’m guessing that what’s getting passed to the underlying R script within the Power BI environment is not the same format the raw data file has, but rather, the format we have selected within Power BI after importing data.

This also makes the syntax a lot easier! We don’t have to worry about 2 and 4-digit years either since all of them are formatted the same way once we select the date display format 🙂

Finally, now that we have the right values loaded into R, all we have to do is plot using ggplot instead of showing the table.

image

So it seems like you can’t just copy-paste code from standard R environment to Power BI directly, at least when it comes to working with dates.

I haven’t shown how to plot the 2 series on the same x-axis, but this should be pretty straightforward with ggplot.

Advertisements
This entry was posted in General and tagged , , , , . Bookmark the permalink.

One Response to Working with date fields in R and Power BI

  1. arun says:

    Thanks for your post! Your grid.table idea was useful to me for debugging my code.

    When I import a date field into R script, it splits it up as year, quarter, month, day and gives these individual fields instead of a single date field. Then we need to paste those individual fields together to get an R date! This is just plain dumb!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s