Crowdsourcing session, CSCW 2013

ACM Conference on Computer Supported Cooperative Work and Social Computing
26 February, 2013
San Antonio, TX

Crowdsourcing session


Tammy Waterhouse – Pay by the Bit: Information-theoretic metric for collective human judgment

Collective human judgment: using people to answer well-posed objective questions [RIGHT/WRONG]. Collective human computation in this context – related questions grouped into tasks, e.g. birthdays of each Texan legislator.

Gave example of Galaxy Zoo. Issues of measuring human computation performance. Fast? Encourages poor quality. Better? Percent correct isn’t always useful/meaningful.

Using info entropy – self-information of random outcome (surprise associated w/ outcome); entropy of random variable is its expected information. Resolving collective judgment – model uses Bayesian techniques. Then looked at entropy remaining after conditional information – conditional entropy. Used data from Galaxy Zoo to look at question scheduling; new approach improved overall performance.


Shih-Wen Huang – Enhancing reliability using peer consistency evaluation in human computation

Human computation not reliable – when tested, many people couldn’t count the nouns in 15-word list. Without quality control, they have 70% accuracy. Believes quality control is most important thing in human computation.

Gold standard evaluation: objectively determined correct answer [notably, not always possible]. Favored by researchers but not scalable because gold standard answers are costly to generate.

Peer consistency in GWAP: sometimes use inter-player consistency to reward/score. Mechanism significantly improves outcomes. Using peer consistency evaluation as scalable mechanism – can it work? Used AMT to test it. Concludes peer consistency is scalable and effective for quality control.


Derek Hansen – Quality Control Mechanisms for Crowdsourcing: Peer Review, Arbitration, & Expertise at FamilySearch Indexing

FamilySearch Index is one of largest crowdsourcing projects around. Volunteers transcribe old records – 400K contributors.

Looked at several models to improve efficiency while reducing added time. Use a downloaded package to do tasks, can use keystroke logging with idle time to evaluate task efficiency. Comparing arbitration process with a simple review. A-B agreement by form field varied. Experienced contributors had improved agreement.

Implications: retention is important – experienced workers faster, more accurate; encourages novices and experts to do more; contextualized knowledge, specialized skills needed for some tasks.  Tension between recruitment and retention with crowdsourcing – assumption that more people makes up for losing an experienced person, which is not always true. In this context it would take 4 new recruits to replace 1 experienced volunteer.

Findings: no need for a second round of review/arbitration – only slight reduction of error and arbitration adds more time (than it’s really worth).

Implications: peer review has considerable efficiency gains, nearly as good quality as arbitration process. Can prime reviewers to find errors, highlight potential problems (e.g., flagging), etc. Integrate human and algorithmic transcription – use algorithms on easy fields integrated with human reviews.

Outcomes for and Benefits to Participants

Conference on Public Participation in Scientific Research, Day 1 Session 3, 8/4/2012


Building Evaluation Capacity for PPSR
Tina Philips

Focus on evaluation and why it’s needed. Running Nestwatch piqued interest in evaluation, happens in many contexts – pretty much every sector does it.

Many reasons to evaluate – why not do it? Much to gain from better understanding impacts. What evaluation is not: an audit; assessment; survey – biggest misconception, key to process of evaluation but not the whole thing; research – goals, audience, end products are very different. End goal of evaluation is improving something. Evaluators are not dementors!

Evaluation is systematic collection of data to determine strengths and weaknesses of programs, policy, products, so as to improve their overall effectiveness. Involves planning, implementation, and reporting out – similar to scientific research methods and does use similar methodologies. Takes into account stakeholders, all of them. Because stakeholders and contexts are unique, every evaluation is different.

When to evaluate? Many times – Front-end, formative, summative. Questions about what is evaluated? Individual outcomes – cognitive, affective and behavioral. Also programmatic and community-level, but focus here is individuals. Reason to look at this is participants are people, not technicians or laborers, they come to interact and do something meaningful. We owe it to them to let them know what they’ll get out of participating, and evaluation is needed to understand.

Challenging work – main reason it doesn’t happen is time and money constraints, and many PPSR leaders are interdisciplinary – not trained in evaluation. This is the reason for the development of the DEVISE toolkit to help non-evaluators conduct quality evaluations.

For evaluation and design, really important to know goals, outcomes and indicators. Goals are broad, outcomes are more specific, and indicators are the evidence of outcomes. Common pitfalls: wishy-washy outcomes, not aligning outcomes with activities, expecting too much of project, expecting learning through osmosis, not providing support for learning – including behavior change.

Intro to basic DEVISE framework: behavior & stewardship; skills of science inquiry; knowledge of the nature of science; motivation; efficacy; interest in science & the environment. Work in progress, but toolkit is going to address these domains & constructs. Shouldn’t try to evaluate them all, choose and align to the project itself.

Take aways: evaluation is doable, can improve your program, improve chances for sustainability, lead to best practices, and demonstrate impact as a field.

Understanding the Connection Between Participant Motivation and Program Outcomes for Effective Program Design
Kris Stepenuk

Started working with water quality monitoring as a kid, family activity based on concern for kids’ health. Outcomes were identifying hotspots along river for contamination, which it did. Now she coordinates the program, looking to understand motivations, outcomes based on literature, and what we don’t know. Challenge: become researchers of the discipline.

Presented motivations for her project; social outcomes are important parts of motivation. In general, motivations tend to be altruistic and/or related to personal learning.

Indian Country 101: Tribal Communities as Partners in Environmental Restoration
Chris Shelley

If you want to do PPSR on tribal lands, you need to understand the needs and context. In the Columbia River Basin, salmon is critical – 30% of calories in diet, 300 lbs/person/year, they consider themselves salmon people and they take care of the fish. But salmon are in crisis, and so are the communities – what makes them who they are is disappearing.

Was part of the “Salmon Corps” which has 7 site locations and is part of AmeriCorps – map of 4 reservations and their ceded land that was given up in treaties of 1855. Treaty tribes didn’t cede all lands, also retained rights for fishing. Salmon Corps did restoration – fencing pastures to keep cattle out of streams. They also got college credit for work, so it wasn’t just labor but also education.

The Corps members embraced hip-hop culture, wearing jeans halfway down their ass, but doing restoration work helped re-connect them with their culture. Did a culture camp where they learned how to do traditional tribal skills, and they did a lot of cool stuff that were important services: wolf introduction, native plantings, restored habitats, assisted people during flash floods, etc.

But again, didn’t give up fishing rights despite ceding lands, so they have the right to co-manage the salmon resources. They get to do cool stuff off-reservation to help manage salmon, which sometimes butts up against what scientists think is right due to a cultural gap for what is appropriate in science. There was no salmon in the Umatilla River because of land usage, but they restored it back to a natural salmon spawning stream. Great quotes from participants about the meaningfulness of this work: “I know I need an education, but I also want to help the environment and help my people.” – Jeanine Jim-Bluehorse

Most people live near a reservation for which tribes still retain some rights off-reservation due to interpretations of treaties by Supreme Court. Still have access to resources like water, so they have the right to manage those resources – so how does this intersect with PPSR? Hope there are things you want to know about the traditional lands of indigenous people and you’ll collaborate with them to help them manage their resources and help you learn things about the resources that you couldn’t know otherwise. Believes salmon crisis cannot be involved without tribal partners being central. Their input will upset some scientists because it’s based in traditional knowledge, not Aristotelian. It will be hard to reconcile, but it’s still worth doing.

Working with these groups will be frustrating to outsiders but incredibly mutually beneficial. Wherever you have cultural diversity in a stable community, you also have biodiversity – this needs to be preserved and supported.

Citizen Science: Science as if People Mattered
Raj Pandya

Very funny intro! We should look at participants as partners in science, not as people doing our science. Science developed with communities, in the context of communities, doing things that communities can live with.

Whatever you call it, PPSR demographics show under-represented groups participate less than majority groups, less affluent participants also outnumbered by affluent ones. Huge group of people are not at the table, and if you’re not at the table you’re often on the menu. Why?

Many issues of access, these are the easiest to fix. Gets harder as you go down the list – cultural barriers can be solved with time and effort. Relevance is most difficult – are the problems investigated by citizen science aligned with community priorities? If we keep on this way, we’ll continue developing “whitey” programs, no offense intended.

Student project in Louisiana Delta, called Vanishing Points, with mobile phone app where people can collect stories/images/etc for culturally, personally, economically important places, and look at what’s likely to happen to those places. Another set of projects around wild rice in the White Earth nation. Third project working on managing meningitis in the Sahel. Meningitis is epidemic in this area, every few years cases spike, lots of mortality and disability. Everyone who lives there tells you it’s a dry season problem, and when the rain arrives the problem goes away. Using this knowledge is really important for effectively distributing limited vaccine supplies.

Steps to take: Align research with community priorities – requires working in interdisciplinary teams and talking to a lot of different community members. Plan for co-management – something is going to go wrong at some point, and you need a plan for trying to deal with that. Incorporate multiple kinds of knowledge – Chris already covered this, just need to harken back to that sense of humility and make space for other knowledge to be relevant and important to the project. Communicate: often has to happen in really small settings, constant work day after day in community settings. It’s really all about engaging the community at every step of the process, deciding what counts as data, what data means, how and when data will be collected, what data is appropriate to share, and working with communities to apply that data to their needs.

By paying even more attention to doing science with people, citizen science can provide a model for making science more relevant and useful.