Intro to data visualization

Readings

  • Business Analytics: Chapter 2 - Describing the Distribution of a Single Variable

  • Business Analytics: Chapter 3 - Finding Relationships Among Variables

Both Ch 2 and 3 are chock full of good stuff for using Excel to start exploring data. Hopefully some of it is review for you but I’m sure there is much new stuff for most of you.

The following compressed folder contains a few pdfs discussing principles of graphical excellence and development of effective business dashboards. I’ve also included it in the Downloads-DataViz.zip file.

Downloads

Screencasts and other activities

Start with a general introduction to data visualization principles.

Now we’ll go into more detail by using Excel specific examples.

First some general table and graph principles.

A tour of the Conditional Formatting features in Excel (including creating formula based formats).

Summary statistics and plots such as histograms and box plots are one of the ways that we visualize the distribution of a dataset. In these next few screencasts, we’ll use the DAT along with Excel formulas for doing descriptive statistics. In addition, I’ll show how using range names or Excel Tables can facilitate efficient formula creation. Then I’ll show you three different ways to create histograms (and we are going to see a few more as well in later parts of the course) - using the Data Analysis ToolPak, using the FREQUENCY() array function, and using the newish Excel histogram chart type. Histograms can also be created using Pivot Tables and Charts and I’ll show that in the upcoming session on multidimensional data modeling and analysis. In the screencast on histograms, I’ll also demo the newish box & whisker plots.

Now, see some advanced chart techniques.

The final few slides introduce motion charts, small multiples and some future possibilities. Check out the links on those slides. In particular, the notion of small multiples has become quite important in the field of data visualization. We’ll see that these are quite easy to create with tools like Tableau, but are much more tedious to do in Excel. Creating small multiples with programmatic tools like R or Python is also quite easy. Here’s an example from a blog post I did on Great Lakes water level analysis with R.

_images/great_lakes.png

Explore (OPTIONAL)