Thursday, 22 December 2016

Controversial statues: remove or revise?



The Rhodes Must Fall campaign in Oxford ignited an impassioned debate about the presence of monuments to historical figures in our Universities. On the one hand, there are those who find it offensive that a major university should continue to commemorate a person such as Cecil Rhodes, given the historical reappraisal of his role in colonialism and suppression of African people. On the other hand, there are those who worry that removal of the Rhodes statue could be the thin end of a wedge that could lead to demands for Nelson to be removed from Trafalgar Square or Henry VIII from King’s College Cambridge. There are competing petitions online to remove and retain the Rhodes statue: with both having similar numbers of supporters.

The Rhodes Must Fall campaign was back in the spotlight last week, when the Times Higher ran a lengthy article covering a range of controversial statues in Universities across the globe. A day before the article appeared, I had happened upon the Explorer's Monument in Fremantle, Australia. The original monument, dating to 1913, commemorated explorers who had been killed by 'treachorous natives' in 1864. As I read the plaque, I was thinking that this was one-sided, to put it mildly.

Source: https://en.wikipedia.org/wiki/Explorers%27_Monument
But then, reading on, I came to the next plaque, below the first, which was added to give the view of those who were offended by the original statue and plaque. 

Source: Source: https://en.wikipedia.org/wiki/Explorers%27_Monument
I like this solution.  It does not airbrush controversial figures and events out of history. Rather, it forces one to think about the ways in which a colonial perspective damaged many indigenous people - and perhaps to question other things that are just taken for granted. It also creates a lasting reminder of the issues currently under debate – whereas if a statue is removed, all could be forgotten in a few years’ time. 
Obviously, taken to extremes, this approach could get out of control – one can imagine a never-ending sequence of plaques like the comments section on a Guardian article. But used judiciously, this approach seems to me to be a good solution to this debate.

Friday, 16 December 2016

When is a replication not a replication?


-->
Replication studies have been much in the news lately, particularly in the field of psychology, where a great deal of discussion has been stimulated by the Reproducibility Project spearheaded by Brian Nosek.

Replication of a study is an important way to test the the reproducibility and generalisability of the results. It has been a standard requirement for publication in reputable journals in the field of genetics for several years (see Kraft et al, 2009). However, at interdisciplinary boundaries, the need for replication may not be appreciated, especially where researchers from other disciplines include genetic associations in their analyses. I’m interested in documenting how far replications are routinely included in genetics papers that are published in neuroscience journals, and so I attempted to categorise a set of papers on this basis.

I’ve encountered many unanticipated obstacles in the course of this study (unintelligible papers and uncommunicative authors, to name just two I have blogged about), but I had not expected to find it difficult to make this binary categorisation. But it has become clear that there are nuances to the idea of replication. Here are two of those I have encountered:

a)    Studies which include a straightforward Discovery and Replication sample, but which fail to reproduce the original result in the Replication sample. The authors then proceed to analyse the data with both samples combined and conclude that the original result is still there, so all is okay. Now, as far as I am concerned, you can’t treat this as a successful replication; the best you can say of it is that it is an extension of the original study to a larger sample size.  But if, as is typically the case, the original result was afflicted by the Winner’s Curse, then the combined result will be biased.
b)    Studies which use different phenotypes for Discovery and Replication samples. On the one hand, one can argue that such studies are useful for identifying how generalizable the initial result is to changes in measures. It may also be the only practical solution if using pre-existing samples for replication, as one has to use what measures are available. The problem is that there is an asymmetry in terms of how the results are then treated. If the same result is obtained with a new sample using different measures, this can be taken as strong evidence that the genotype is influencing a trait regardless of how it is measured. But when the Replication sample fails to reproduce the original result, one is left with uncertainty as to whether it was type I error, or a finding that is sensitive to how it is measured. I’ve found that people are very reluctant to treat failures to replicate as undermining the original finding in this circumstance.

I’m reminded of arguments in the field of social psychology, where failures to reproduce well-known phenomena are often attributed to minor changes in the procedures or lack of ‘flair’ of experimenters. The problem is that while this interpretation could be valid, there is another, less palatable, interpretation, which is that the original finding was a type I error.  This is particularly likely when the original study was underpowered or the phenotype was measured using an unreliable instrument. 

There is no simple solution, but as a start, I’d suggest that researchers in this field should, where feasible, use the same phenotype measures in Discovery and Replication samples. Where that is not feasible, the could pre-register their predictions for a Replication Sample prior to looking at the data, taking into account the reliability of the measures of the phenotype and the power of the Replication Sample to detect the original effect, based on the sample size

Tuesday, 13 December 2016

When scientific communication is a one-way street


Together with some colleagues, I am reviewing a set of papers that combine genetic and neuroscience methods. We had noticed wide variation in methodological practices and thought it would be useful to evaluate the state of the field. Our ultimate aim of identifying both problems and instances of best practice, so that we could make some recommendations.

I had anticipated that there would be wide differences between studies in statistical approaches and completeness of reporting, but I had not realised just what a daunting task it would be to review a set of papers. We had initially planned to include 50 papers, but we had to prune it down to 30, on realising just how much time we would need to spend reading and re-reading each article, just to extract some key statistics for a summary.

In part the problem is the complexity that arises when you bring together two or more subject areas, each of which deals with complex, big datasets. I blogged recently about this. Another issue is incomplete reporting. Trying to find out whether the researchers followed a specific procedure can mean wading through pages of manuscript and supplementary material: if you don’t find it, you then worry that you may have missed it, and so you re-read it all again. The search for key details is not so much looking for a needle in a haystack as being presented with a haystack which may or may not have a needle in it.

I realised that it would make sense to contact authors of the papers we were including in the review, so I sent an email, copied to each first and last author, attaching a summary template of the details that had been extracted from their paper, and simply asking them to check if it was an accurate account. I realised everyone is busy and I did not anticipate an immediate response, but I suggested an end of month deadline, which gave people 3-4 weeks to reply. I then sent out a reminder a week before the deadline to those who had not replied, offering more time if needed.

Overall, the outcome was as follows:
  • 15 out of 30 authors responded, either to confirm our template was correct, or to make changes. The tone varied from friendly to suspicious, but all gave useful feedback.
  • 5 authors acknowledged our request and promised to get back but didn’t.
  • 1 author said an error had been found in the data, which did not affect conclusions, and they planned to correct it and send us updated data – but they didn’t.
  • 1 author sent questions about what we were doing, to which I replied, but they did not confirm whether or not our summary of their study was correct.
  • 8 did not reply to either of my emails.

I was rather disappointed that only half the authors ultimately gave us a useable response. Admittedly, the response rate is better than has been reported for people who request data from authors (see, e.g. Wicherts et al, 2011) – but providing data involves much more work than checking a summary. Our summary template was very short (effectively less than 20 details to check), and in only a minority of cases had we asked authors to provide specific information that we could not find in the paper, or confirmation of means/SDs that had been extracted from a digitised figure.  

We are continuing to work on our analysis, and will aim to publish it regardless, but I remain curious about the reasons why so many authors were unwilling to do a simple check. It could just be pressure of work: we are all terribly busy and I can appreciate this kind of request might just seem a nuisance. Or are some authors really not interested in what people make of their paper, provided they get it published in a top journal?




Friday, 28 October 2016

The allure of autism for researchers

Data on $K spend on neurodevelopmental disorder research by NIH: from Bishop, D. V. M. (2010). Which neurodevelopmental disorders get researched and why? PLOS One, 5(11), e15112. doi: 10.1371/journal.pone.0015112

Every year I hear from students interested in doing postgraduate study with me at Oxford. Most of them express a strong research interest in autism spectrum disorder (ASD). At one level, this is not surprising: if you want to work on autism and you look at the University website, you will find me as one of the people listed as affiliated with the Oxford Autism Research Centre. But if you look at my publication list, you find that autism research is a rather minor part of what I do: 13% of my papers have autism as a keyword, and only 6% have autism or ASD in the title. And where I have published on autism, it is usually in the context of comparing language in ASD with developmental language disorder (DLD, aka specific language impairment, SLI). And, indeed in the publication referenced in the graph above, I concluded that there was disproportionate amounts of research, and research funding, going to ASD relative to other neurodevelopmental disorders.

Now, I don’t want to knock autism research. ASD is an intriguing condition which can have major effects on the lives of affected individuals and their families. It was great to see the recent publication of a study by Jonathan Green and his colleagues showing that a parent-based treatment with autistic toddlers could produce long-lasting reduction in severity of symptoms. Conducting a rigorous study of this size is hugely difficult to do and only possible with substantial research funding.

But I do wonder why there is such a skew in interest towards autism, when many children have other developmental disorders that have long-term impacts. Where are all the enthusiastic young researchers who want to work on developmental language disorders? Why is it that children with general learning disabilities (intellectual retardation) are so often excluded from research, or relegated to be a control group against which ASD is assessed?

Together with colleagues Becky Clark, Gina Conti-Ramsden, Maggie Snowling, and Courtenay Norbury, I started the RALLI campaign in 2012 to raise awareness of children’s language impairments, mainly focused on a YouTube channel where we post videos providing brief summaries of key information, with links to more detailed evidence. This year we also completed a study that brought together a multidisciplinary, multinational panel of experts with the goal of producing consensus statements on criteria and terminology for children’s language disorders – leading to one published paper and another currently in preprint stage. We hope that increased consistency in how we define and refer to developmental language disorders will lead to improved recognition.

We still have a long way to go in raising awareness. I doubt we will ever achieve a level of interest to parallel that of autism. And I suspect this is because autism fascinates because it does not appear just to involve cognitive deficits, but rather a qualitatively different way of thinking and interacting with the world. But I would urge those considering pursuing research in this field to think more broadly and recognise that there are many fascinating conditions about which we still know very little. Finding ways to understand and eventually ameliorate language problems or learning disabilities could help a huge number of children and we need more of our brightest and best students to recognise this potential.

Sunday, 16 October 2016

Bishopblog catalogue (updated 16 Oct 2016)

Source: http://www.weblogcartoons.com/2008/11/23/ideas/

Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015)

Autism
Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014)

Genetics
Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Neuroscience
Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Reproducibility
Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016)  

Statistics
Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016)   Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016)

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014)

Women
Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016)

Saturday, 1 October 2016

On the incomprehensibility of much neurogenetics research


Together with some colleagues, I am carrying out an analysis of methodological issues such as statistical power in papers in top neuroscience journals. Our focus is on papers that compare brain and/or behaviour measures in people who vary on common genetic variants.

I'm learning a lot by being forced to read research outside my area, but I'm struck by how difficult many of these papers are to follow. I'm neither a statistician nor a geneticist, but I have nodding acquaintance with both disciplines, as well as with neuroscience, yet in many cases I find myself struggling to make sense of what researchers did and what they found. Some papers that have taken hours of reading and re-reading to just get at the key information that we are seeking for our analysis, i.e. what was the largest association that was reported.

This is worrying for the field, because the number of people competent to review such papers will be extremely small. Good editors will, of course, try to cover all bases by finding reviewers with complementary skill sets, but this can be hard, and people will be understandably reluctant to review a highly complex paper that contains a lot of material beyond their expertise.  I remember a top geneticist on Twitter a while ago lamenting that when reviewing papers they often had to just take the statistics on trust, because they had gone beyond the comprehension of all but a small set of people. The same is true, I suspect, for neuroscience. Put the two disciplines together and you have a big problem.

I'm not sure what the solution is. Making raw data available may help, in that it allows people to check analyses using more familiar methods, but that is very time-consuming and only for the most dedicated reviewer.

Do others agree we have a problem, or is it inevitable that as things get more complex the number of people who can understand scientific papers will contract to a very small set?

Saturday, 3 September 2016

Some thoughts on the Statcheck project



Yesterday, a piece in Retractionwatch covered a new study, in which results of automated statistics checks on 50,000 psychology papers are to be made public on the PubPeer website.
I had advance warning, because a study of mine had been included in what was presumably a dry run, and this led to me receiving an email on 26th August as follows:
Assuming someone had a critical comment on this paper, I duly clicked on the link, and had a moment of double-take when I read the comment.
Now, this seemed like overkill to me, and I posted a rather grumpy tweet about it. There was a bit of to and fro on Twitter with Chris Hartgerink, one of the researchers on the Statcheck project, and with the folks at Pubpeer, where I explained why I was grumpy and they defended their approach; as far as I was concerned it was not a big deal, and if nobody else found this odd, I was prepared to let it go.
But then a couple of journalists got interested, and I sent them a more detailed thoughts.
I was quoted in the Retraction Watch piece, but I thought it worth reporting my response in full here, because the quotes could be interpreted as indicating I disapprove of the Statcheck project and am defensive about errors in my work. Neither of those is true. I think the project is an interesting piece of work; my concern is solely with the way in which feedback to authors is being implemented. So here is the email I sent to journalists in full:
I am in general a strong supporter of the reproducibility movement and I agree it could be useful to document the extent to which the existing psychology literature contains statistical errors.
However, I think there are 2 problems with how this is being done in the PubPeer study.
1. The tone of the PubPeer comments will, I suspect alienate many people. As I argued on Twitter, I found it irritating to get an email saying a paper of mine had been discussed on PubPeer, only to find that this referred to a comment stating that zero errors had been found in the statistics of that paper.
I don't think we need to be told that - by all means report somewhere a list of the papers that were checked and found to be error-free, but you don't need to personally contact all the authors and clog up PubPeer with comments of this kind.
My main concern was that during an exceptionally busy period, this was just another distraction from other things. Chris Hartgerink replied that I was free to ignore the email, but that would be extremely rash because a comment on PubPeer usually means that someone has a criticism of your paper.
As someone who works on language, I also found the pragmatics of the communication non-optimal. If you write and tell someone that you've found zero errors in their paper, the implication is that this is surprising, because you don't go around stating the obvious*. And indeed, the final part of the comment basically said that your work may well have errors in it and even though they hadn't found them, we couldn't trust it.
Now at the same time as having that reaction, I appreciate this was a computer-generated message, written by non-native English speakers, that I should not take it personally, and no slur on my work was intended. And I would like to know if errors were found in my stats, and it is entirely possible that there are some, since none of us is perfect. So I don't want to over-react, but I think that if I, as someone basically sympathetic to this agenda, was irritated by the style of the communication, then the odds are this will stoke real hostility for those who are already dubious about what has been termed 'bullying' and so on by people interested in reproducibility.
2. I'll be interested to see how this pans out for people where errors are found.
My personal view is that the focus should be on errors that do change the conclusions of the paper.
I think at least a sample of these should be hand-checked so we have some idea of the error rate - I'm not sure if this has been done, but the PubPeer comment certainly gave no indication of that - it just basically said there's probably an error in your stats but we can't guarantee that there is, putting the onus on the author to then check it out.
If it's known that on 99% of occasions the automated check is accurate, then fine. If the accuracy is only 90% I'd be really unhappy about the current process as it would be leading to lots of people putting time into checking their papers on the basis of an insufficiently sensitive diagnostic. It would make the authors of the comments look frankly lazy in stirring up doubts about someone's work and then leaving them to check it out.
In epidemiology the terms sensitivity and specificity are used to refer to the accuracy of a diagnostic test. Minimally if the sensitivity and specificity of the automated stats check is known, then those figures should be provided with the automated message.

The above was written before Dalmeet drew my attention to the second paper, in which errors had been found. Here’s how I responded to that:

I hadn't seen the 2nd paper - presumably because I was not the corresponding author on that one. It's immediately apparent that the problem is that F ratios have been reported with one degree of freedom, when there should be two. In fact, it's not clear how the automated program could assign any p-value in this situation.

I'll communicate with the first author, Thalia Eley, about this, as it does need fixing for the scientific record, but, given the sample size (on which the second, missing, degree of freedom is based), the reported p-values would appear to be accurate.
  I have added a comment to this effect on the PubPeer site.


* I was thinking here of Gricean maxims, especially maxim of relation.