Monday, April 06, 2015

Learning Tableau for data visualization

Tableau is a software package used for data visualization. I have heard of it before and it came up again as part of the Emerging Technologies Working Group that I am leading at work. Normally I'm the kind of person that needs a practical reason to learn a software tool and until recently I didn't have any data visualization needs so Tableau was kind of on the back burner. A few weeks ago, however, I started to have some questions about Supreme Court decisions and decided I would try to answer my questions using Tableau. My question is whether there have been more cases decided recently with a five vote majority opinion than in years past.

Step 1: Getting the data
The first step to answering my question is getting access to data on Supreme Court cases. I checked the website for the Supreme Court and while they do have data on decisions available I did not find an easily exportable list of what I was hoping to visualize. After some web searching I found the Supreme Court Database and it has exactly what I need. The Supreme Court Database has a dataset of decisions from the 1946 through 2013 terms which includes a ton of information regarding all cases brought before the court. They offer this data as a CSV download in several different formats. I chose to download case centered data organized by court citation. Since I am not well versed in the workings of the court it is possible that this isn't the best dataset to use, but for the purposes of learning the software it was sufficient.

Step 2: Getting the software
Tableau offers a 30 day trial of their desktop software so I downloaded it and got started. They also offer a web based version but I wanted to experiment with the desktop version more.

Step 3: Creating a new workbook and adding data
Tableau is similar to Excel in that you have workbooks that contain data and graphs. When you create a new workbook the first thing you need to do is add data. Unlike Excel you need to connect to a datasource which can be something as simple as a file or as complex as a database server. My data is in a CSV file so I directed Tableau to connect to the file I downloaded from the Supreme Court Database.

Tableau classifies information in the data as either a Dimension or a Measure. Basically dimensions a are headers (text, dates, etc.) and measures are axes on a graph (numbers).

Step 4: Creating my first graph
Once Tableau is connected to your data you can create a worksheet. Tableau's interface gives you the ability to setup rows and columns for creating either tables or graphs. Since I was interested in votes over time I used the "Date Decision" dimension for my column and the "Maj Votes" measure as my rows. Tableau defaults to calculate the sum of majority votes which really isn't useful, so I switched this to the average which provides a little more information.

This graph shows the average majority vote per year. If there were more cases being decided with a five vote majority then this graph should show the average vote approaching five over time. As you can see, that is not clearly the case. And in fact, if you turn on trend lines (Analysis, Trend Lines, Show Trend Lines) Tableau shows that the average majority vote count is going up per year, and not down. 

But what if I wanted a graph showing the number of cases decided by five votes per year? This is a different graph that requires counting majority votes but only those that have a value equal to five. I created a new worksheet with "Date Decision" as my column and "Maj Votes" as my rows, but this time I asked Tableau to show the Count of "Maj Votes". This graph should show the number of dockets before the court each year since each docket has a number in the "Maj Votes" column of the data. Next I dragged the "Maj Votes" measure into the Filters pane to apply a filter to the data. I filtered for a range of values between five and five and turned on Trend Lines to generate the following graph:

So, it looks like the number of cases decided by five votes is not increasing over time. 

There is a major issue with my data and conclusion, however. I am simply counting the number of cases decided by five votes and not the percentage of cases. My graphs don't include the total number of cases, so my conclusion may not be accurate. Tableau does include functionality to create calculated fields but I haven't figured that out, yet. Stay tuned to see if my initial graphs were misleading or not.