We have just got a new dog (Ted) and he is getting to know my Mum’s dog (Belle). The new boy gets attention that would otherwise been only Belle’s, and understandably there can be a bit of jealousy at times.
So, I was interested to see this story yesterday morning – “dogs experience human like jealousy” – reporting an investigation of dogs behaviours when their owners are occupied with different objects. Later on this blog post popped up on Twitter. The post looks at a graph from the original PLoS paper and re-design created by the NY Times. The figures shows how many dogs exhibited different behaviours when their owners were playing with something else – a stuffed dog toy, halloween pumpkin or a book).
In the JunkCharts post, Kaiser Fung poses the question “Is data visualization worth paying for?” Will the NYT (with direct or indirect influence of their awesome data journalism-visualisation-design team producing content for a broad audience with 1,250,000 and 760,000 paper and digital readers respectively) be better than that of scientists (perhaps dealing with a smaller readership and maybe less visual design expertise). Is visualisation expertise worth paying for?
Yes and No. Not all costs were considered.
Amongst the key differences between the figures (developing narrative, adding icons…) Junk Charts notes that NYT “Removed technical details of p-values, not important to NYT readers”. Er, yes they are important. It is part of the story, if slightly technical. The p-values are part of the data and are not superfluous.
Not everyone may be familiar with a graph like that in the PLoS article, they are generally found in scientific and statistical publications:
Above the blocks of the bar chart are stars and brackets that indicate which values are statistically different (given the statistical test). If the values are linked by a bracket they are statistically significant, the more stars there are, the more significant and more different (given the statistical test). This means even if the bars look different, without the bracket those differences could just be chance. Whilst the visual representation of stars and brackets isn’t exactly elegant it’s a convention that can be quickly learnt (you can also use letters above the bars to indicate which values are different). But even if you know the convention it isn’t always easy to decode especially when there are 3 or more categories/groups. stars and brackets aren’t that salient, as they are misaligned with the values, so you have to think about it. But they are important.
In this case, why does it matter that the NYT left the values out? The NYT graphic says “Jealous Dogs: In a study, dogs reacted more strongly when their owners paid attention to stuffed dogs than to more generic objects.” Well that isn’t entirely true. The p values in the PLoS figure say that dogs raise their tail just as much with a stuffed dog toy, a pumpkin and a book (i.e. there are no stars or brackets, and so no significant differences. This is the first behaviour the NYT graphic reports, and it doesn’t come under their “more strongly” tag line. There is no statistical difference. The same goes for barking. Even though it looks like dogs bark more, they don’t according to the statistics and “technical details of p-values”.
It may seem like nit picking but lets follow it through a bit further.
Dogs push or bite at the pumpkin and book about the same amount, but less than the stuffed dog (according to the stats) but tend to whimper at or block the same amount for the stuffed dog and pumpkin, but more than the book (according to the stats). Seems a bit complicated. These results, which are decipherable but not immediately obvious in the original graph, tell us that the principle that the NYT reports is a bit off the mark. There is in fact only one behaviour – touch or push owner – that is actually truly represented within the style NYT has adopted. Three of the seven behaviours – touch or push owner behaviour, touch or push object, and bite or snap at object –have statistical support for the dogs reacting more strongly to the stuffed dog toy than more generic objects. Yet there is a majority of cases that don’t support that explanation.
So what to do? Is there a better solution than the stars and brackets? I am home alone with the dog while the family have a week’s holiday and had a bit of inspiration. So here is my quick go at the problem. Why not apply a bit of a force to the bars if they are different?
Bars would then be statistically indistinguishable if they are touching (even if they look different), all values are statistically different if they are not touching and if there different levels of touching (proximity) the there are different levels of significance.
This way the secondary characteristic of the data is easy to include in the same encoding. The bars further away indicate greater significance (smaller p-values) as long as the category groupings are clear. The bars in any group could also have different distances for different levels of significance (not shown here though 🙂
Here is it in the original figure in the new house style (with little icons for the legend):
Here with the force separation rather than stars and stripes (original data order):
And here with the NYT ordering:
But we can do a bit better. Firstly by highlighting the fact there are two groups in the data (significant and insignificant results) and shading out the insignificant differences:
When you look at this you will see that the stuffed dog toy always has a great response than the book. But the pumpkin has is sometimes the same as the book, and sometimes the same as the toy dog. That isn’t as good a headline though “Dogs can appear more jealous of stuffed toys than books, with the exception of 2 insignificant results.” But this is a straight forward interpretation of the data (information from statistical tests is also data and can’t be chucked away).
I’m not entirely keen on the horizontal layout as it makes it difficult to pick out the value of the middle category in the graphs so we could make a vertically aligned version of the graph. And one step more, we could group the findings across the patterns of significance. This is far from perfect but remains closer to the truth of that paper. And it illustrates something about the science too. You could increase the detail of groupings (e.g. book response significantly lower than toy and pumpkin, toy significantly higher than book and pumpkin) but for the purposes of the post I think this makes the point.
What does it mean for this study that there is no significant differences between the ‘raise tail’ and ‘bark’ behaviours? Are dogs just very jealous of books and bark? It is hard to tell without a control test investigating what the dogs would do ‘normally’ and what they would do if their owners just ignored them. It could be that dogs raise their tails a lot, and bark a bit, when things happen. In scientific publications the graphs are rarely independent of the whole text, or the possibilities they did not include. Would it be ridiculous to indicate that there are things we don’t know in the figure?