+ - 0:00:00
Notes for current slide
Notes for next slide

Visualizing Data

1 / 36

Recap

  • We have spent a lot of time working with ggplot2

  • Everything covered this week can be applied with ggplot2

  • Goal of this week is to get you to stop and think about your plots before you make them. What are the kinds of things you should be considering?

2 / 36

Stuff you already know

Data visualization is both an art and a science

Art:

  • Aesthetically pleasing
  • The story you're trying to tell

Science:

  • Graphical representation of specific types of data
  • The story you're trying to tell
3 / 36

Warning

Making figures for academic purposes making figures for pure data viz purposes

If you go into data science (and aren't constrained by academia), you'll want to check out R shiny (for interactive plots), web design, typography, etc.

4 / 36

Today

  • What kinds of things should you be thinking about when it comes to data viz for academic papers?

  • What not to do (I'm gonna rant a bit)

  • Hopefully helpful resources (no memorizing!)

5 / 36

What should we be thinking about?

  1. Telling your story

  2. Contrast

  3. Accessibility

6 / 36

Telling your story

  • Raw data are not intuitive. For the most part, you can't look at a spreadsheet of numbers and decipher any patterns. Especially with really big spreadsheets!

  • We need a way to graphically show the data so that our human eyes can try to make sense of the data.

  • It is so easy to LIE with data! Balancing act of conveying your message and not lying.

  • As our datasets become more complex and high dimensional, data visualization can become more challenging.

  • The goal is NOT to show every bit of data you have collected. It's to show the relationship you care about in an honest manner.

7 / 36

What makes a good figure?

  • Clear, descriptive title

  • Axes are clearly labeled with variables and units of measurement

    • Label. Your. Damn. Axes.
  • Scale is:

    • consistent across axis
    • easily interpreted
    • chosen so that data are evenly distributed (remember restricted range from correlation?)
  • Data points are represented clearly, with a good key/legend if needed

  • Graph is the appropriate type for your data (nominal, ordinal, interval, ratio etc.)

8 / 36

Appropriateness

There are no 100% right answers, but there are wrong ones...

  • Pie charts are never the answer; 3D pie charts are the worst of the worse

    • Instead, try stacked bar plot or even better, stacked dot plot
  • 3D bar plots are never the answer

  • Typically, you should show either the raw data or at least a distribution

    • Shelly is anti bar plot, generally
    • Shelly is anti box plots on their own (nice when combined)
    • Shelly is skeptical of lines -- need to be careful
9 / 36

Why Bar Plots Can Suck

  • Different alternatives exist, with dot plots being the best option probably

  • Bars are more appropriate when you have proportions or counts; but even still -- dotplots

  • Tracey Weissgerber has an excellent thread with many resources on why barplots suck and how to maximize the utility of dotplots: https://tinyurl.com/3e6222a4

10 / 36

What to do?

11 / 36

Why adding lines can be misleading

13 / 36

Lines Cont...

14 / 36

Simpson's Paradox

15 / 36

Things to avoid

  • Chartjunk

  • Misleading text/axes

  • Inaccurate plotting

  • So many COVID-19 data visualizations...

16 / 36

Chartjunk

17 / 36

chartjunk is anything that gets in the way of reading the information displayed. this is stuff like unnecessary lines or grids behind the graph. patterns within the graph that might create the impression of movement or vibration like the diagonal lines. or something called DUCKS which are any dressings added to the graph that are distracting. like this weird monster or the clowns. it can be fancy fonts or 3d effects or again, anything that detracts from the data itself.

Misleading Text/Axes

18 / 36

Inaccurate Plotting

This strange plot was put out by Georgia's Health Department. It's trying to show that basically there haven't been any real changes in COVID-19 statewide. But look at the values in the legend...They've changed them to basically keep the same graph. WUT?!

19 / 36

Misleading Shapes

20 / 36

  • Kevin Hart: 5'2
  • Shaquille O'Neal: 7'1
  • Yao Ming: 7'6
21 / 36

Accessibility is IMPORTANT

  • Colorblindness sucks. ~1 in 12 men are colorblind (much lower in women)

  • People have poor vision (glasses, anyone?)

  • Journals scale your figure sizes down so that it fits within the article (like within a column of text)

22 / 36

Accessibility is IMPORTANT

  • Colorblindness sucks. ~1 in 12 men are colorblind (much lower in women)

  • People have poor vision (glasses, anyone?)

  • Journals scale your figure sizes down so that it fits within the article (like within a column of text)

What can we do?

  • Colors
  • Contrast
  • Big text size (bigger/bolder you can get away with less color contrast)
22 / 36

Color Palettes

Also super helpful! RColorBrewer and ggsci are great. But there are millions of others.

Ex: the color palette for all of the slides on this website? The Aussie color palette from https://flatuicolors.com/palette/au

All you need are hex codes (6 digits, alphanumeric). This is true for all color palettes (including monochromatic).

Different types of palettes (unordered, sequential, divergent etc.). Look at this blog post to learn more about these

23 / 36

Contrast

When you have something side-by-side, you can have different colors. OR you can have the same color but a different shade/tone/tint.

24 / 36

Bad use of colors

25 / 36

Grayscale

The most obvious of this is grayscale (technically, it's not a hue, and you're dealing with saturation, but that's completely unimportant for this intro):

26 / 36

Providing Contrast is Important

It lets the reader easily extract meaningful information.

Colors, shapes, size (as in bubble plots), sometimes transparency etc.

Need help picking different shades/tones/tints of the same color? A ton of websites can help! Ex: https://www.colorhexa.com/

27 / 36

Text Size

Don't make your font size super uber small!

Some of us are getting old and our eyesight is fading (I'm not bitter...yes I am...)

Also, academic journals scale down your figures. Better to make the text size larger so that when it gets scaled down, it's still readable.

28 / 36

Aspect Ratios

You want to display your figures faithfully, but you don't want to take up extra space you don't need. Think about the aspect ratio of your plots!

Tufte, Visual Explanations. Graphics Press, Cheshire, Conn, 1997

29 / 36

Rainbow Colormaps

What is this?

32 / 36

Don't Use Rainbow Colormaps

Rogowitz & Treinish, IEEE Spectrum, 35(12):52-59. 1998

33 / 36

Self-evaluation

How do you know if you've made a good figure?

  • Does it EASILY communicate what you want?

  • Do readers need to read and re-read your figure legend, or is your message clear?

  • Is it accessible to people with poor eyesight or colorblindness?

  • Does it faithfully reflect your data? Beauty + truth

34 / 36

Rules

  • Don't make a plot when a table will do
  • Represent data with appropriate significant figures
  • Use appropriate plot types for the data types
  • Label your axes
  • Title your figures
  • Whenever possible, show all the data (or at least the distribution)
  • Don't rely on a legend or caption/text
  • Don't rely on default plotting conventions
  • If possible, show outliers rather than removing them
  • Sort categorical data accordingly
  • Exploit small multiples to great effect (in R, use faceting)
  • Strive to maintain the same color conventions/palettes across all figures
  • Start with what you want your plot to look like, then work backwards
  • SHOW. YOUR. CODE.
35 / 36

Helpful Things in R

  • geom_dotplot; better than bar plots, typically (see earlier slide)
  • Revisit our section on ggplot2; I bet you missed a lot...
  • Raincloud plots; blog post here
  • Using the same theme modifications for all plots? Make a function to store that theme and call it later (more on functions soon!)
  • Google is your friend; Google is your professor
  • Interested in making generative art in R? Check out the Twitter accounts of Danielle Navarro and Ijeamaka Anyene; then look at their websites!
  • Want examples of good COVID plotting with open R code? Check out Chris Prener's Twitter account. If you'd like this as a weekly newsletter, check out his River City Data tracking site!
36 / 36

Recap

  • We have spent a lot of time working with ggplot2

  • Everything covered this week can be applied with ggplot2

  • Goal of this week is to get you to stop and think about your plots before you make them. What are the kinds of things you should be considering?

2 / 36
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow