Why Sabotage is Rarely an Issue in Citizen Science

Following a recent Nature editorial, the Citizen Science researcher-practitioner community has been having a lively discussion. Muki Haklay made a great initial analysis of the editorial, and you should read that before continuing on.

OK, now that you know the back story, a related comment from Sam Droege on the cit-sci-discuss listserv included the following observation:

Statistically, to properly corrupt a large survey without being detected would require the mass and secret work of many of the survey participants and effectively would be so complicated and nuanced that it would be impossible to manage when you have such complex datasets as the Breeding Bird Survey.

I agree, and I frequently have to explain this to reviewers in computing, who are often concerned about the risk of vandalism (as seen all over Wikipedia).

Based on a very small number of reports from projects with very large contributor bases—projects that are statistically more likely to attract malcontents due to size and anti-science saboteurs due to visibility—only around 0.0001% of users (if that) are blacklisted for deliberately (and repeatedly) submitting “bad” data.

If we presume that we’re failing to detect such behavior for at least a few more people than the ones we actually catch, say at the level of a couple orders of magnitude, we’d still only be talking about 0.01% of the users, who pretty much always submit less than 0.01% of the data (these are not your more prolific “core” contributors). In no project that I’ve ever encountered has this issue been considered a substantial problem; it’s just an annoyance. Most ill-intentioned individuals quickly give up their trolling ways when they are repeatedly shut down without any fanfare. From a few discussions with project leaders, it seems that each of those individuals has a rather interesting story and their unique participation profiles make their behaviors obvious as…aberrant.

In fact, the way most citizen science projects work makes it unlikely that they would be seen as good targets for malicious data-bombing anyway. Why? For better or worse, a lot of citizen science sites provide relatively little support for social interaction: less access to an audience means they’re not going to get a rise out of people. Those projects that do have vibrant online communities rarely tolerate that kind of thing; their own participants quickly flag such behavior and if the project is well-managed, the traces are gone in no time, further disincentivizing additional vandalism.

From a social psychological standpoint, it seems that the reality of the situation is actually more like this:

  1. convincingly faking scientific data is (usually a lot) more work than collecting good data in the first place;
  2. systematically undermining data quality for any specific nefarious purpose requires near-expert knowledge and skill to accomplish, and people fitting that profile are unlikely to be inclined to pull such shenanigans;
  3. anyone who genuinely believes their POV is scientifically sound should logically be invested in demonstrating it via sound science and good data quality;
  4. most citizen science projects do not reward this kind of behavior well enough to encourage ongoing sabotage, as discussed above; and
  5. as Sam noted, effectively corrupting a large-scale project’s data without detection requires a lot of smarts and more collaboration than is reasonable to assume anyone would undertake, no matter how potentially contentious the content of the project. They’d be more likely to succeed in producing misleading results by starting their own citizen-counter-science project than trying to hijack one. And frankly, such a counter-science project would probably be easy to identify for what it was.

Seriously, under those conditions, who’s going to bother trying to ruin your science?