We have spent a lot of time working with ggplot2
Everything covered this week can be applied with ggplot2
Goal of this week is to get you to stop and think about your plots before you make them. What are the kinds of things you should be considering?
Data visualization is both an art and a science
Art:
Science:
If you go into data science (and aren't constrained by academia), you'll want to check out R shiny (for interactive plots), web design, typography, etc.
What kinds of things should you be thinking about when it comes to data viz for academic papers?
What not to do (I'm gonna rant a bit)
Hopefully helpful resources (no memorizing!)
Telling your story
Contrast
Accessibility
Raw data are not intuitive. For the most part, you can't look at a spreadsheet of numbers and decipher any patterns. Especially with really big spreadsheets!
We need a way to graphically show the data so that our human eyes can try to make sense of the data.
It is so easy to LIE with data! Balancing act of conveying your message and not lying.
As our datasets become more complex and high dimensional, data visualization can become more challenging.
The goal is NOT to show every bit of data you have collected. It's to show the relationship you care about in an honest manner.
Clear, descriptive title
Axes are clearly labeled with variables and units of measurement
Scale is:
Data points are represented clearly, with a good key/legend if needed
Graph is the appropriate type for your data (nominal, ordinal, interval, ratio etc.)
There are no 100% right answers, but there are wrong ones...
Pie charts are never the answer; 3D pie charts are the worst of the worse
3D bar plots are never the answer
Typically, you should show either the raw data or at least a distribution
Different alternatives exist, with dot plots being the best option probably
Bars are more appropriate when you have proportions or counts; but even still -- dotplots
Tracey Weissgerber has an excellent thread with many resources on why barplots suck and how to maximize the utility of dotplots: https://tinyurl.com/3e6222a4
https://www.autodesk.com/research/publications/same-stats-different-graphs
Chartjunk
Misleading text/axes
Inaccurate plotting
So many COVID-19 data visualizations...
chartjunk is anything that gets in the way of reading the information displayed. this is stuff like unnecessary lines or grids behind the graph. patterns within the graph that might create the impression of movement or vibration like the diagonal lines. or something called DUCKS which are any dressings added to the graph that are distracting. like this weird monster or the clowns. it can be fancy fonts or 3d effects or again, anything that detracts from the data itself.
This strange plot was put out by Georgia's Health Department. It's trying to show that basically there haven't been any real changes in COVID-19 statewide. But look at the values in the legend...They've changed them to basically keep the same graph. WUT?!
Colorblindness sucks. ~1 in 12 men are colorblind (much lower in women)
People have poor vision (glasses, anyone?)
Journals scale your figure sizes down so that it fits within the article (like within a column of text)
Colorblindness sucks. ~1 in 12 men are colorblind (much lower in women)
People have poor vision (glasses, anyone?)
Journals scale your figure sizes down so that it fits within the article (like within a column of text)
What can we do?
Also super helpful! RColorBrewer
and ggsci
are great. But there are millions of others.
Ex: the color palette for all of the slides on this website? The Aussie color palette from https://flatuicolors.com/palette/au
All you need are hex codes (6 digits, alphanumeric). This is true for all color palettes (including monochromatic).
Different types of palettes (unordered, sequential, divergent etc.). Look at this blog post to learn more about these
When you have something side-by-side, you can have different colors. OR you can have the same color but a different shade/tone/tint.
The most obvious of this is grayscale (technically, it's not a hue, and you're dealing with saturation, but that's completely unimportant for this intro):
It lets the reader easily extract meaningful information.
Colors, shapes, size (as in bubble plots), sometimes transparency etc.
Need help picking different shades/tones/tints of the same color? A ton of websites can help! Ex: https://www.colorhexa.com/
Some of us are getting old and our eyesight is fading (I'm not bitter...yes I am...)
Also, academic journals scale down your figures. Better to make the text size larger so that when it gets scaled down, it's still readable.
You want to display your figures faithfully, but you don't want to take up extra space you don't need. Think about the aspect ratio of your plots!
Tufte, Visual Explanations. Graphics Press, Cheshire, Conn, 1997
https://www.juiceanalytics.com/writing/better-know-visualization-small-multiples
https://twitter.com/chrisprener/status/1375937857840353280/photo/2
What is this?
Rogowitz & Treinish, IEEE Spectrum, 35(12):52-59. 1998
How do you know if you've made a good figure?
Does it EASILY communicate what you want?
Do readers need to read and re-read your figure legend, or is your message clear?
Is it accessible to people with poor eyesight or colorblindness?
Does it faithfully reflect your data? Beauty + truth
R
, use faceting)R
geom_dotplot
; better than bar plots, typically (see earlier slide)ggplot2
; I bet you missed a lot...R
? Check out the Twitter accounts of Danielle Navarro and Ijeamaka Anyene; then look at their websites!R
code? Check out Chris Prener's Twitter account. If you'd like this as a weekly newsletter, check out his River City Data tracking site!We have spent a lot of time working with ggplot2
Everything covered this week can be applied with ggplot2
Goal of this week is to get you to stop and think about your plots before you make them. What are the kinds of things you should be considering?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |