Why Sabotage is Rarely an Issue in Citizen Science

Following a recent Nature editorial, the Citizen Science researcher-practitioner community has been having a lively discussion. Muki Haklay made a great initial analysis of the editorial, and you should read that before continuing on.

OK, now that you know the back story, a related comment from Sam Droege on the cit-sci-discuss listserv included the following observation:

Statistically, to properly corrupt a large survey without being detected would require the mass and secret work of many of the survey participants and effectively would be so complicated and nuanced that it would be impossible to manage when you have such complex datasets as the Breeding Bird Survey.

I agree, and I frequently have to explain this to reviewers in computing, who are often concerned about the risk of vandalism (as seen all over Wikipedia).

Based on a very small number of reports from projects with very large contributor bases—projects that are statistically more likely to attract malcontents due to size and anti-science saboteurs due to visibility—only around 0.0001% of users (if that) are blacklisted for deliberately (and repeatedly) submitting “bad” data.

If we presume that we’re failing to detect such behavior for at least a few more people than the ones we actually catch, say at the level of a couple orders of magnitude, we’d still only be talking about 0.01% of the users, who pretty much always submit less than 0.01% of the data (these are not your more prolific “core” contributors). In no project that I’ve ever encountered has this issue been considered a substantial problem; it’s just an annoyance. Most ill-intentioned individuals quickly give up their trolling ways when they are repeatedly shut down without any fanfare. From a few discussions with project leaders, it seems that each of those individuals has a rather interesting story and their unique participation profiles make their behaviors obvious as…aberrant.

In fact, the way most citizen science projects work makes it unlikely that they would be seen as good targets for malicious data-bombing anyway. Why? For better or worse, a lot of citizen science sites provide relatively little support for social interaction: less access to an audience means they’re not going to get a rise out of people. Those projects that do have vibrant online communities rarely tolerate that kind of thing; their own participants quickly flag such behavior and if the project is well-managed, the traces are gone in no time, further disincentivizing additional vandalism.

From a social psychological standpoint, it seems that the reality of the situation is actually more like this:

  1. convincingly faking scientific data is (usually a lot) more work than collecting good data in the first place;
  2. systematically undermining data quality for any specific nefarious purpose requires near-expert knowledge and skill to accomplish, and people fitting that profile are unlikely to be inclined to pull such shenanigans;
  3. anyone who genuinely believes their POV is scientifically sound should logically be invested in demonstrating it via sound science and good data quality;
  4. most citizen science projects do not reward this kind of behavior well enough to encourage ongoing sabotage, as discussed above; and
  5. as Sam noted, effectively corrupting a large-scale project’s data without detection requires a lot of smarts and more collaboration than is reasonable to assume anyone would undertake, no matter how potentially contentious the content of the project. They’d be more likely to succeed in producing misleading results by starting their own citizen-counter-science project than trying to hijack one. And frankly, such a counter-science project would probably be easy to identify for what it was.

Seriously, under those conditions, who’s going to bother trying to ruin your science?

Citizen Science 2015 Conference Attendance Plans

I can’t wait to get to San Jose and catch up with all of my citizen science colleagues at the Citizen Science Conference this week!

But if other attendees are anything like me, they’re torn in a dozen directions for every session. So much good content, so many people to catch up with, and so very little time…

To facilitate meeting planning, I’m publishing my conference attendance plan below. Feel free to contact me to set up a time to talk if you’re one of the many people who’s said “let’s meet up” but hasn’t nailed down a time yet. For any early risers (or anyone from the East Coast) I’m also available over breakfast every day.

2/10: Travel day, arriving late evening, with a possible late dinner

2/11: Conference Day 1

  • 7:30-8:30: Coffee/registration (arriving as early as I can manage)
  • 8:30 – 9:45: Welcome & Keynote
  • 9:55 – 11:15: Session 1E (Best Practices)
  • 11:15 – 11:45: Coffee. Yes, more will be needed by then.
  • 11:150 – 1:10: Session 2E (Best Practices)
  • 1:10 – 2:30: Lunch, still finalizing plans
  • 2:40 – 4:00: either Session 3F (eBird) or 3G (Grand Challenges)
  • 4:10 – 5:30: Session 4E (Digital)
  • 5:30 – 7:30: Posters, Reception, Hackfest
  • 7:30: Dinner, currently free for ad hoc plans

2/12: Conference Day 2

  • 7:10 – 8:10: Coffee & Meet/Greet CSA Board (arriving as early as I can manage)
  • 8:10 – 9:30: Session 5F (Digital)
  • 9:40 – 11:00: Session 6E (Digital) – come hear our awesome speakers!
  • 11:00 – 1:00: Open Space and lunch, no current plans
  • 1:00 – 4:00: Session 8B (Digital & Best Practices) – I’ll be talking about a human computation perspective on citizen science data quality
  • 4:10 – 5:30: Keynote & Closing
  • 5:30 – 7: FREE
  • 7:00: Dinner with Biocubes project partners

2/13: Travel day, departing in afternoon

  • AM is free through lunch. If left to my own devices, I’ll go birding and then do some work until around noon, have lunch, and head out to SJC. Also happy to have company or meetings in the AM or over a prompt 12-1 PM lunch.

Citizen Science Data Quality is a Design Problem

I’ve been giving talks for years that boil down to, “Hey citizen science organizers, it’s up to you to design things so your volunteers can give you good data.” I genuinely believe that most data quality issues in citizen science are either 1) mismatched research question and methodology, or 2) design problems. In either case, the onus should fall on the researcher to know when citizen science is not the right approach or to design the project so that participants can succeed in contributing good data.

So it’s disheartening to see a headline like this in my Google alerts: Study: Citizen scientist data collection increases risk of error.

Well. I can only access the abstract for the article, but in my opinion, the framing of the results is all wrong. I think that the findings may contribute a useful summary–albeit veiled–of the improvements to data quality that can be achieved through successive refinements of the study design. If you looked at it that way, the paper would say what others have: “after tweaking things so that normal people could successfully follow procedures, we got good data.” But that’s not particularly sensational, is it?

Instead, the news report makes it sound like citizen science data is bad data. Not so, I say! Bad citizen science project design makes for bad citizen science data. Obviously. (So I was really excited to see this other headline recently: Designing a Citizen Science and Crowdsourcing Toolkit for the Federal Government!)

The framing suggests that the authors, like most scientists and by extension most reviewers, probably aren’t very familiar with how most citizen science actually works. This is also completely understandable. We don’t yet have much in the way of empirical literature warning of the perils, pitfalls, and sure-fire shortcuts to success in citizen science. I suspect a few specific issues probably led to the unfortunate framing of the findings.

The wrong demographic: an intrinsically-motivated volunteer base is typically more attentive and careful in their work. The authors saw this in better results from students in thematically aligned science classes than general science classes. The usual self-selection that occurs in most citizen science projects that draw upon volunteers from the general public might have yielded even better results. My take-away: high school students are a special participant population. They are not intrinsically-motivated volunteers, so they must be managed differently.

The wrong trainers and/or training requirements: one of the results was that university researchers were the best trainers for data quality. That suggests that the bar was too high to begin with, because train-the-trainer works well in many citizen science projects. My take-away: if you can’t successfully train the trainer, your procedures are probably too complicated to succeed at any scale beyond a small closely-supervised group.

The wrong tasks: students struggled to find and mark the right plots; they also had lower accuracy in more biodiverse areas. There are at least four problems here.

  1. Geolocation and plot-making are special skills. No one should be surprised that students had a hard time with those tasks. As discussed in gory detail in my dissertation, marking plots is a much smarter approach;  using distinctive landmarks like trail junctions is also reasonable.
  2. Species identification is hard. Some people are spectacularly good at it, but only because they have devoted substantial time and attention to a taxon of interest. Most people have limited skills and interest in species identification, and therefore probably won’t get enough practice to retain any details of what they learned.
  3. There was no mention of the information resources the students were provided, which would also be very important to successful task completion.
  4. To make this task even harder, it appears to be a landscape survey in which every species in the plot is recorded. That means that species identification is an extra-high-uncertainty task; the more uncertainty you allow, the more ways you’re enabling participants to screw up.

On top of species identification, the students took measurements, and there was naturally some variation in accuracy there too. There are a lot of ways the project could have supported data quality, but I didn’t see enough detail to assess how well they did. My take-away: citizen science project design usually requires piloting several iterations of the procedures. If there’s an existing protocol that you can adopt or adapt, don’t start from scratch!

To sum it up, the citizen science project described here looks like a pretty normal start-up, despite the slightly sensational framing of the news article. Although one of the authors inaccurately claims that no one is keeping an eye on data quality (pshah!), the results are not all that surprising given some project design issues, and most citizen science projects are explicitly structured to overcome such problems. For the sharp-eyed reader, the same old message shines through: when we design it right, we can generate good data.

Stitch Fix: Efficient Fashion for the Professoriate

Over the last few months, I’ve had to really up my game in a number of categories, including personal appearance. PhD students and even postdocs pretty much all wear utilitarian, cheap clothing, and when I got a faculty job, I knew my well-worn and overly casual wardrobe wasn’t going to cut it anymore.

I forced myself to do some shopping, all the while cringing at how much time it took to find just one or two items. Let’s face it, the last thing a new junior faculty member has time for is clothes shopping. As the semester progresses–and the weather gets colder in spite of my lack of appropriate layers–this becomes even more true.

So like many of you, I’d heard of this thing called Stitch Fix. When I looked more closely at the details, I figured it was worth a gamble: if even one item worked out for me in a shipment, it would be an improvement over trying to find it myself. And when my first Fix arrived this week, I actually kept three items–a total win!

Here’s why I think Stitch Fix is a great solution for academics:

  1. Academics need to look professional (at least occasionally), but rarely have the interest, patience, fashion sense, or time to go shopping. They usually have enough disposable income to selectively acquire items priced above fast fashion rates. Their time is worth enough to them that it’s easy to make a strong economic argument for outsourcing clothing selection.
  2. There’s an adequately extensive style profile to ensure that you get appropriate items, but it won’t take all day to fill out. You can also send your stylist short notes for each Fix (I told mine that I need some items in school colors, for example).
  3. Internet-and-USPS powered. No trip to stores or malls. No crowds or pressure. Shipping prepaid in both directions. Super efficient!
  4. You try on the clothes at home, under normal lighting, at your leisure (within 3 days of receipt). This is wonderful. It’s a zero-pressure environment and you can make a much more confident purchase decision once you’ve tried pairing items with what’s in your closet already.
  5. They send things you wouldn’t have picked, but which you should try anyway. Since there are only 5 things to try on, you might as well try all of them–and you might even like them! I scored two of those in my first Fix.
  6. The higher per-item cost is completely and utterly worthwhile because #3. I also immediately realized how much I was limiting myself by using price as a first-round filter for what I try on, so this provides a counterbalance.
  7. The style cards are awesome: they show each item you got in a couple of different configurations, to give you ideas on how to wear them. As a result, I pulled out my leather knee boots for the first time in years, and they looked great with my new blouse and skirt! (Note for any librarians in the house: the style cards accumulate into a catalog of your wardrobe!)
  8. There’s a feedback cycle to improve your selections over time and let your stylist know if you need something special for an upcoming event or want to try something new.
  9. Did I mention that it saves a ton of time?

I can think of no better testament than pointing out that they sent a pair of (skinny!) jeans that fit really well on my very first Fix. As any woman knows, the search for good jeans can be a lifelong quest, so having someone I’ve never met send me a pair that fits beautifully? Simply amazing!

If you’re adequately convinced to try Stitch Fix for yourself, please do me a solid in return and use my referral link: http://stitchfix.com/sign_up?referrer_id=4201271

Responding to Reviewers

“Revise and resubmit” is really the best outcome of academic peer review – acceptance for publication as submitted is so rare it may as well not exist, and most papers are genuinely improved through the peer review and revision processes. Generally speaking, an additional document detailing changes must accompany the revised submission, but the conventions for writing these “change logs” are a little opaque because they’re not typically part of the public discussion of the research.

San Antonio Botanical Gardens during CSCW 2013

There are a couple of great examples of change logs from accepted CSCW 2013 papers from Merrie Morris, and I’m offering my own example below as well. It’s no secret that my CSCW 2013 paper was tremendously improved by the revision process. I wrote the initial submission in the two weeks between submitting my final dissertation revisions and graduation. For a multitude of reasons, it wasn’t the ideal timing for such an endeavor, so I’m glad the reviewers saw a diamond in the rough.

My process for making revisions starts with not getting upset about criticism to which I willingly subjected myself – happily, a practice that becomes easier with time and exposure. (If needed, you can substitute “get upset/rant/cry in private, have a glass of wine, cool off, sleep on it, and then come back to it later,” which is a totally valid way to get started on paper revisions too.) Hokey as it sounds, I find it helpful to remind myself to be grateful for the feedback. And that I asked for it.

Then I print out the reviews, underline or highlight the items that need attention, and summarize them in a few words in the margin. Next, I annotate a copy of the paper to identify any passages that are specifically mentioned, and start to figure out where I need to make changes or could implement reviewers’ suggestions. I find these tasks much easier to do on paper, since being able to spread out all the pages around me sometimes helps when working on restructuring and identifying problem points.

During or after that step, I create a new word processing document with a table and fill it in with terse interpretations of the comments, as you’ll see in the example below. In the process, I sort and group the various points of critique so that I’m only responding to each point once. This also ensures that I’m responding at the right level, e.g., “structural problems” rather than a more specific indicator of structural problems.

The actual columns of the table can vary a little, depending on the context – for example, a table accompanying a 30-page journal manuscript revision in which passages are referenced by line number would naturally include a column with the affected line numbers to make it easier for the reviewer to find and evaluate the updated text. In the example below, I made such substantial changes to the paper’s structure that there was no sense in getting specific about section number, paragraph, and sentence.

As a reviewer, I’m all for process efficiency; I strongly prefer concise documentation of revisions. At that stage, my job is to evaluate whether my concerns have been addressed, and the documentation of changes should make that easier for me, rather than making me wade through unnecessary detail. Likewise, as an author, I consider it a problem with my writing if I need to include a lengthy explanation of why I’ve revised the text, as opposed to the text explaining itself. That heuristic holds under most circumstances, unless the change defies expectations in some fashion, or runs counter to a reviewer’s comment — which is fine when warranted, and the response to reviewers is the right place to make that argument.

Therefore, the response to reviewers is primarily about guiding the reviewer to the changes you’ve made in response to their feedback, as well as highlighting any other substantive changes and any points of polite disagreement. In a response to reviewers, the persuasive style of CHI rebuttals, the closest parallel practice with which many CSCW authors have experience, seems inappropriate to me because the authors are no longer in a position of persuading me that they can make appropriate revisions, but are instead demonstrating that they have done so. Ergo, I expect (their/my) revisions to stand up to scrutiny without additional argumentation.

Finally, once all my changes are made and my table is filled in, I provide a summary of the changes, which includes any other substantive changes that were not specifically requested by the reviewers, and note my appreciation for the AC/AE and reviewers’ efforts. A jaded soul might see that as an attempt at flattering the judges, but it’s not. I think that when the sentiment is genuine, expressing gratitude is good practice. In my note below, I really meant it when I said I was impressed by the reviewers’ depth of knowledge. No one but true experts could have given such incisive feedback and their insights really did make the paper much better.

——————————

Dear AC & Reviewers,

Thank you for your detailed reviews on this submission. The thoroughness and depth of understanding that is evident in these reviews is truly impressive.

To briefly summarize the revisions:

  • The paper was almost completely rewritten and the title changed accordingly.
  • The focus and research question for the paper are now clearly articulated in the motivations section.
  • The research question makes the thematic points raised by reviewers the central focus.
  • The analytical framework is discussed in more depth in the methods section, replacing less useful analysis process details, and is followed up at the close of the discussion section.
  • The case comparison goes into greater depth, starting with discussion of case selection.
  • The case descriptions and comparison have been completely restructured.
  • The discussion now includes an implications section that clarifies the findings and applicability to practice.

Below are detailed the responses to the primary points raised in the reviews; I hope these changes meet with your approval. Regardless of the final decision, the work has unquestionably benefited from your attention and suggestions, for which I am deeply appreciative.

Reviewer Issue Revisions
AC No clear research question/s A research question is stated toward the end of page 2.
AC, R1, R3 Findings are “obvious” The focus of the work is reframed as addressing obvious assumptions that only apply to a limited subset of citizen science projects, and the findings – while potentially still somewhat obvious – provide a more useful perspective.
AC, R2 Conclusions not strong/useful A section addressing implications was added to the discussion.
AC Improve comparisons between cases Substantial additional comparison was developed around a more focused set of topics suggested by the reviewers.
AC Structural problems The entire paper was restructured.
R1 Weak title The title was revised to more accurately describe the work.
R1 Does not make case for CSCW interest Several potential points of interest for CSCW are articulated at the end of page 1.
R1 Needs stronger analytic frame & extended analysis The analytic framework is described in further detail in the methods section, and followed up in the discussion. In addition, a section on case selection criteria sets up the relevance of these cases for the research question within this framework.
R1 Quotes do not add value Most of this content was removed; new quotes are included to support new content.
R1, R3 Answer the “so what?” question & clarify contributions to CSCW The value of the work and implications are more clearly articulated. While these implications could eminently be seen as common sense, in practice there is little evidence that they are given adequate consideration.
R1 Include case study names in abstract Rewritten abstract includes project names.
R1 Describe personally rewarding outputs in eBird These are described very briefly in passing, but with the revised focus are less important to the analysis.
R2 Compare organizational & institutional differences Including these highly relevant contrasts was a major point of revision. A new case selection criteria section helps demonstrate the importance of these factors, with a table clarifying these contrasts. The effects of organizational and institutional influences are discussed throughout the paper.
R2 Highlight how lessons learned can apply to practice The implications section translates findings into recommendations for strategically addressing key issues. Although these are not a bulleted list of prescriptive strategies, the reminder they provide is currently overlooked in practice.
R2 Comparison to FLOSS is weak This discussion was eliminated.
R2 Typos & grammatical errors These errors were corrected; hopefully new ones were not introduced in the revision process (apologies if so!)
R3 Motivation section does not cite related work Although the rewritten motivation section includes relatively few citations, they are more clearly relevant. For some topics, there is relatively little research (in this domain) to cite.
R3 Motivation section does not discuss debated issues The paper now focuses primarily on issues of participation and data quality.
R3 Consistency in case description structure The case descriptions are split into multiple topics, within which each case discussed. The structure of case descriptions and order of presentation is consistent throughout.
R3 Include key conclusions about each case with descriptions The final sentence of the initial descriptions for each case summarizes important characteristics. I believe the restructuring and refocusing of these revisions should address this concern.
R3 Does not tie back to theoretical framework used for analysis The Implications section specifically relates the findings back to the analytical framework, now discussed in greater detail in the methods section.
R3 No discussion of data quality issues This is now one of the primary topics of the paper and is discussed extensively. In addition, I humbly disagree that expert review is unusual in citizen science (although the way it was conducted in Mountain Watch is undoubtedly unique). Expert data review has been shown to be one of the most common data validation techniques in citizen science.
R3 No discussion of recruitment issues This topic is now one of the primary topics of the paper and is discussed extensively.
R3 Introduce sites before methods The case selection criteria section precedes the methods and includes overview descriptions of the cases. They are also given a very brief mention in the motivation section. More detailed description as relevant to the research focus follows the methods section.
R3 Do not assume familiarity with example projects References to projects other than the cases are greatly reduced and include a brief description of the project’s focus.
R3 Tie discussion to data and highlight new findings While relatively few quotes are included in the rewritten discussion section, the analysis hopefully demonstrates the depth of the empirical foundation for the analysis. The findings are clarified in the Implications section.
R3 Conclusions inconsistent with other research, not tied to case studies, or both To the best of my knowledge, the refocused analysis and resultant findings are no longer inconsistent with any prior work.

 

Big Issues in CSCW session, CSCW 2013

ACM Conference on Computer Supported Cooperative Work and Social Computing
26 February, 2013
San Antonio, TX

Big Issues in CSCW session

——

Ingrid Erickson – Designing Collaboration: Comparing Cases Exploring Cultural Probes as Boundary-Negotiating Objects

Cultural probes around boundary negotiating objects – prompts via Twitter, e.g. “Take a picture of sustainability (post it with #sustainability and #odoi11)” – leveraged existing platforms. Lots of very interesting images came from this. Content not profound, but prompts engendered communication with people on the street, people in teams, dialogues that generated new hashtags besides those requested. Led into a design workshop.

Another instance of using cultural probes with Light in Winter event (in Ithaca, NY). Found that probes have several properties that make them generative.

  • Exogenous: probes act like exogenous shocks to small organizations systems, interruption to normal practice that requires attention, initiating mechanism for collaboration.
  • Scaffolding: directed but unbounded tasks; hashtags and drawings act as scaffolds, directed boundary work to prompt engagement, informal structure that supports exploration over accuracy/specificity.
  • Diversity: Outputs improved by diversity – diverse inputs increased value, acted as funnel for diversity to become collective insight.

Think about designing collaboration – taboo topic with inherent implication of social engineering, but we’ve been doing it all along. As designed activities, cultural probes were oblique tasks to invite interpretation and meaning-making, build on exogenous shock value, give enough specificity for mutual direction, salient to context but easy to understand.

Potential to use distributed boundary probes? Online interaction space/s – assemble, display inputs; organize w/ hashtag/metadata; easy way to revise organizational schemes as they are negotiated; allow collaborators to hear thinking-aloud of fellow collaborators; can be designed as a game or casually building engagement over longer periods of time.

—-

Steve Jackson – Why CSCW Needs Science Policy (and Vice-Versa)

CSCW impact means making findings relevant to new and broader communities, make the work more effective and meaningful in the world.

We’re all used to “implications for design” and maybe even “implications for practice”, but need to start including more “implications for policy” in our work moving forward. Often fail to make connections in a useful way, need to learn from policy and policy research. Not immediately relevant to all CSCW research, but relevant at the higher level. The connections are just underdeveloped relative to potential value.

Particularly important around collaborative science, scientific practices – Atkins report as a prime example. Separate European trajectory covered in the paper along with history of science policy as relates to CSCW. So-called “supercomputing famine” in the 1970’s (drew laughs) reflected ambition of transforming science with technologies. Leading examples – CI generation projects – may also be misleading as these are the big money CIs. Ethnographic studies now including up to 250 informants but all projects are examples from MREFC projects – major research equipment funding something something.

CSCW & policy gap – institutional tensions in funding, data sharing practices and policies, software provision & circulation.

Social contract with science – support, autonomy, and self-governance in exchange for goods, expertise, and the (applied) fruits of (basic) science – this was the attitude after WWI. Stepping away from pipeline model, moved toward post-normal science. Identified 3 modes of science, which are culturally specific. Can’t wait to read this paper!