On cherry trees, as in data, there will be some pieces of fruit that look great – so plump and red that you want to sink your teeth right in (after taking a pic for Instagram, of course). There will be other pieces that are a bit wonky, or unripe, or asymmetrical, or rotten. They look unappetizing, and you don’t really want to pick them at all.
It’s tempting to pick only the freshest, juiciest-looking fruit. This is called ‘cherry picking’ and it’s a great move if you’re a jam-maker, but a terrible idea if you’re a researcher.
In a research context, ‘cherry picking’ refers to the practice of choosing only the pieces of data or evidence that support your hypothesis. In other words, you have a point you want to make, and you have a whole lot of data – but you pick only the ‘juiciest’ pieces to report.
Researchers who ‘cherry pick’ aren’t using false data; they are simply using incomplete data. But it’s dishonest all the same.
As researchers, we have to take the good data with the bad. Sometimes we get an outlier that seems completely rotten, but it’s still part of our data and it still needs to factor into our findings.
For qualitative researchers working with focus groups, interviews, and surveys, that means taking care to represent the breadth of your participants’ responses (and including a representative range of quotations).
For quantitative researchers, that means reporting your results in a way that accurately represents your data as a whole, without avoiding any inconvenient outliers or data points that don’t support your hypotheses.
And for all researchers, it’s important to synthesize others’ research (for instance, in the literature review) in a way that represents the breath of knowledge available on a topic. Science writer Ben Goldacre has pointed out a major paradox in the research world: individual experiments are designed to eliminate as much bias as possible, but the way that multiple sets of research findings are synthesized together is far less controlled.
For instance, if I want to draw on existing research to prove that cherries are red, I could easily amass a bibliography full of articles about the ruby-coloured Bing, Brooks, and Tartarian varieties. But I’d better not ignore the fact that the Ranier variety is yellow. In fact, my analysis would be stronger for including the yellow variety, not ignoring it.
Treating unexpected data responsibly is not only ethical – it’s also an opportunity for discovery. Case in point: the team on the Cancer Genome Project at the Wellcome Trust Sanger Institute were analyzing genomes of leukaemia patients when they spotted something anomalous in their data. They saw some drastic structural changes in chromosomes from one patient that didn’t fit the scientific understanding (at the time) of how DNA is damaged.
Did the researchers conveniently ignore the strange data? No. Was it wrong? No. It was the basis of a huge discovery: cancer, which was previously understood to be caused by a slow accumulation of genetic mutations over time, could also be caused by a single chromosomal ‘explosion’ (Stephens et al., 2011). That’s an absolute game-changer of a revelation, and it came from data that might have looked, to some, like a rotten cherry.
So I say pick all the cherries and make use of each one, regardless of how much bird poo, mould, or rot infects it. Unless, as I’ve said, you’re making jam. In which case please, for the love of crumpets, pick the nice ones and pile on the sugar!
Stephens, P. J., Greenman, C. D., Fu, B., Yang, F., Bignell, G. R., Mudie, L. J., … & McLaren, S. (2011). Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell, 144(1), 27-40. doi: 10.1016/j.cell.2010.11.055