Trip Report: London Citizen Cyberscience Summit, Day One

My notes from the first day of the London Citizen Cyberscience Summit (#LCCS2) are pasted below; the full program is online at http://cybersciencesummit.org/35-2. Note that these notes are just that – my own personal notes on the talks. Apologies for bad formatting; Posterous is really terrible for posting from a mobile device if you don’t do it the way that they expect you to!

Ordinarily I just send these to the PhD listserv at Syracuse, but I think a much broader audience could benefit from access to these notes. It’s been a really interesting event. I am very pleased to have been able to attend because the exposure to a much broader range of participatory science projects, initiatives, and interpretations has been just great.

—-

Francois Grey – coordinator of Citizen Cyberscience Center @ CERN, introducing program.

Citizen cyberscience relies on Internet and web to broaden participation, expands beyond prior audiences. Science is too important to be left to scientists alone. The public wants to participate actively in science, not as a passive consumer. Especially important with respect to politics, journalism, and more so in science. Summit because it’s more of a grassroots effort, but that means reinventing the wheel because people are not aware of what others are doing. Second idea is being able to bring together citizens and scientists, looking at how it can be transformative to individuals.

—–

Silvertown – Evolution MegaLab and iSpot

Citizens pay for all our science, whether we allow them to participate or not – yet another reason to incorporate them. Lots of interest in citizen science, still getting a sense of what all the flavors are. Sharing experiences with a couple projects he’s been involved in. Thinking about what people know and how to work with what they know.

3 types of crowd knowledge: wisdom of the crowd – everyone has an opinion on and can vote, each vote has equal weight; wisdom in the crowd, opinions weighted by expertise and more diverse audiences are better; wisdom from the crowd, collective knowledge and interactions make the whole more than the sum of the parts. Corresponding types of network topologies: hierarchical, radial, interconnected.

Evolution MegaLab – hands-on part of a larger series of projects. Polymorphic snails with different colors and banding patterns. Already studied by population geneticists since 1920s, hypothesis that polymorphism had changed in response to global warming. Tendency in 1970s for more light colored snails in southern parts of Europe. Climate has warmed more than 1.3 degrees C in Europe, so has this affected the snails? Temperature warms the snails inside and can kill them if it gets too high, and yellow snails are more reflective of sunlight than brown or pink ones. Gave people a variety of explanations on the website, gave them layered access to as much information as they wanted. Major advertising campaign in 14 languages, operated as franchise in each country. Took advantage of 80 years of study, digitized all the data on 8K populations and mapped them, can drill down to individual populations so you can see who collected data and when. So they can direct people to return to the same sites. Translation was quite a task. Replicated data entry sheets online, fill in data, and get immediate feedback looking for data within 5km for comparison.

Verification an issue with everyone from kids to retired professors – how to know if people are IDing the right species? Online quiz for training and weighting data – original plan – what really happened is that most people didn’t take the quiz and it seemed to be a rainy-day classroom task. Found out of 2K-3K responses that they got 62% right in first try. Location-specific field guides for disambiguation. 33% able to tell adult/juvenile, 84% got adult subspecies right, 95% scored banding correctly, 94% scored yellow correctly. Data cleaning showed that juvenile C. nem might be mistaken for adult C. hort. Cleaned data came out to 3K samples. Very expensive to get public to do this, not a cheap way to gather this many samples, but you get public engagement out of it as well.

Results – MegaLab data different from historical data, public collect in cities where they live, whereas professionals would collect in countryside. Big gap in data from France. Statistical comparison saw no heterogeneity, but 10K population showed that % yellow didn’t really increase over time for full population, but significant increase for populations located in dunes. Behaviorally, if it’s too hot, the snails move away under vegetation, but that is less of an option in dunes so there was change where it was forced.

3 lessons from MegaLab: difficult to evaluate skill, indirect (quiz) evidence can be used in verification, creating a self-help network among users might have improved data quality.

iSpot – more of a tool to help people identify species and put names to tags, key in ecology and biology. Part of OPAL, lottery funded. Goal is creating new generation of naturalists, used to be big in England but there’s a sense that this has been lost and there’s no education at any level to help people learn to ID things. Topology of network – has expert-based networks, but because it covers all species, no one is an expert of everything, so you get a synergistic network where everyone is an expert and a beginner at the same time, only as hierarchical as it needs to be. Uses reputation and badges to show expertise. Can gather albums, get points for getting names right, move to other groups and start to learn from them, and move up the learning curve. Some people joining societies and becoming specialists, very successful use of algorithms to show collective expertise for verification. Created a virtuous circle and earn reputation, people are gently moved up knowledge curve.

Highly successful, had no idea how or whether it would work. 36K submissions without a name, within a half an hour 38% get a name, 54% in an hour, and it’s very scalable. Reason is the nature of the network. Now the question is what to do as it becomes global. About 100K observations at the moment, question of how you globalize expertise, but the answer is already there in the network. Someone offers an ID and qualifies it by saying that there maybe other species in a different location – experts know what they don’t know, won’t stray outside of their expertise.

3 lessons from iSpot: reputation system is key, is increasing, and the recognition of it gives scope to increase through learning; multiple roles is very important, helps it becomes learning environment; experts are aware of what they don’t know. Citizen science is as much about the sociology of science as the domain science.

Qs: not many repeat observations, though some set out to see every grasshopper or tree. Major costs? At the end of the day it’s the staff time, had to handcraft everything to create things that were adequately specific. Cheaper ways of doing it now.

—–

Tokumine – No citizens, no science.

Works for Vizzuality, specializing in data visualization and analysis. Talking about people power, member of Citizen Science Alliance. Worked on Old Weather project, using historical data for climate models, and Planet Hunters to find planets around distant stars.

Old Weather: distribution of weather stations and so on leave big holes in knowledge of historical weather data. Royal Navy kept logbooks all over the planet, but in a format that is hard to liberate, script is difficult to read, lends itself to humans doing transcription. Users enter data, coordinates, weather details like wind speed, temperature, and incidental details like events that happen on board. In a year, 24K+ volunteers have transcribed over 800K logbook pages.

How to motivate them? Topical and interesting, captures the imagination. Site is easy to understand, clear call to action, explanations of purpose. Gamification as well, earn badges to become “captain,” and people get quite competitive, so those elements drive some participation. Observations on forum about infectious feedback, people are really into it, huge teams of people are engaging with each other. 255 ships’ logs complete.

Important to give participants immediate feedback, e.g., statistical feedback. Needs repackaging of data and feeding it back to them in a useful form. Animations of the ships traveling across the seas and showing temperature, really brings it alive to people. Visualiztion that shows tracks along with event data. These are really important to motivating people. Event analysis, shows that ships are very sad places (deaths at sea), most popular events included sports like cricket. Allowing people to contribute flexibly are important, there was more variety in ways people contribute than they expected. Showing preliminary results as feedback is really valuable for motivating. Individual transcriptions 97% accurate: out of 1K logbook entries, 3 lost because of transcription errors, 10 illegible logs, 3 are errors in logs themselves. Impressive given how hard it is to read handwriting.

Planet Hunters: uses NASA Kepler data, looking at light coming from a star over time, if you’re looking for planets, blips in light curves are how you identify them. Computers are good at detecting these, except in noisy data, which humans are good at. People identify potential transits in data, over 10M classifications in 18 months. Who wouldn’t want to discover a planet? Make it really easy to start, show scientific results, give coauthorship to discoverers. Another way to get incredible results is to use traditional media, BBC Stargazing coverage really drove participation up very quickly.

3 points: interesting problems, real research, respect for contributors. CSA offering support to new projects.

Qs: any chance for naming after discoverers? Touchy subject, but there is potential. Highlighted that there is an exchange, not just looking for cheap labor, this tension may be one of the big points of making it mutually beneficial. How did you explore motivations of participants? Understanding motivations can be done several ways, A/B testing presentation and messaging, also through forums which seemed too free form but valuable way to engage. How much are users involved during project design? Dictatorial approach, believe that project team are the people who can create the best product, once there is a first release, there’s an element of evolution based on user feedback. Statistical validity is important, so pipelines of data have to be incredibly clean.

—–

Dunn – Community-sourced structured metadata of English place names
Kings College, Digital Humanities

Mass-digitization of data. Place names are complicated and dynamic, they change over time. They are attested in documentary resources, e.g., in archives, but these are diverse and difficult to get to. The are also contested, disagreement over place names and their etymology. They are documented in a variety of ways and are also researched. Project is digitizing a survey of English place names. Enormous and enormously complicated document, not just in subject matter and in ways it is structured. Each county has own editor so structure changes from volume to volume because guidelines are loose.

About 80 years of data for 32 counties, 86 volumes, 6157 elements, 30517 pages, 4M individual place-name forms, uncounted bibliographic references (a lot!) Survey is aggregation of a lot of different documentary evidence. In 2010 a pilot used NLP to create XML records, great for getting volumes out there in machine-readable form, but they want to do more than that, harmonize and enrich it. Current work is to markup the hierarchies of place names that occur, linking and exposing the data. Will be published as RDF-enabled gazetteer, point-based historic geographic reference with authority of official commissions.

Points, polygons, and lines are problematic, little data on geographic association of place names, points are arbitrary dependent on scale, administrative geographies change over time, and even natural features can mislead, e.g., rivers move over time. Hope is to integrate with other data sources. Need crowdsourcing to correct errors and omissions in NLP and OCR, validate output with local knowledge, add geographic data where missing, identify crossovers with users of other sources, enrich point data with raster and string data, learn more about what communities are interested in.

Qs: thought large part of England had Scandinavian names? Most strongly felt in Northeast. Did names change with Norman conquest? Yes.

—–

Arazy – Citizen Science a Motivational Perspective

Context of research: HCI, focuses on knowledge sharing in a variety of online settings; organizational perspective – to what extent can peer-production models be applied in organizational settings? Study approach is outside observation, participation.

What’s in it for the dude? Studied Stardust@Home, classifying images of collector plates from NASA’s Stardust spacecraft, searching for interstellar dust particles. RQ: what are motivational drivers of quantity and quality of contributions made by volunteers. Quantity is easy to measure, but how does motivation affect quality? Differences from open source, goal there is solving their own problems, rational economic decision argument, not the case for citizen science. Can’t assume the motivations from open source or Wikipedia are the same a those in citizen science.

Motivation framework from social movements theory: collective motives, identification with the group. These have to do with cost-benefit motives and norm-benefit motives. Relationship between quality and quantity: cost-benefit suggests tradeoff between the two but collective/identification orientation suggests both increase together. Cost-benefit analysis shows opposite of expectations – social benefit increases quantity, reputation increases quality, thought there would be negative influences of social benefit on quality and reputation would have negative effects on quantity. This depends on the project. Self-actualization increases both quality and quantity.

Fundamental differences between cost-benefit and self-actualization models. Quantity and quality differ in antecedents,explains prior failure to find significant effect for collective and identification motivates in OSS and Wikipedia. Overall, self-actualization is more important. Insights and recommendations: pay special attention to collective, identification motives. Be careful with benefits model, incentivizing one outcome may cause the other to suffer.

—–

Munyaradzi – transcription of bushman historical text

Bushman people of southern Africa – earliest inhabitants of Earth, unique worldview, most language speakers are dead. Digital Bleek and Lloyd collection contains notebooks, art, and dictionaries that preserve encoded bushman culture. Text contains complex diacritics (over 137, more still being found) with no Unicode representation. RQ is whether volunteer thinking can be used to crowd source the translation, and then how does volunteer thinking compare to machine learning techniques? Also whether cell phones can be used for this.

Using Bossa framework – OSS framework for volunteer thinking projects, minimizes effort of creating and operating projects, supports variance of volunteer skill. Near completion of segmentation and transcription application for the project. Text segmentation is challenging aspect of preparing images for analysis, e.g., diacritics under a character might get cut off by lines. Transcription application shows the image, type in the representation, convert to latex and see the translation – have found ways to create these representations of the diacritics that way. Goals: generic solution for other historical applications, preservation of Bushman historical texts (important to linguists), make text searchable, reprint text into books, text-to-speech.

—–

Wu First and only Chinese volunteer cit sci project in mainland China, focus on volunteer computing and volunteer thinking, CAS@home. Different groups using this infrastructure, so they made a task management interface for scientists. Projects in high energy physics, clean water filtering through nanotubes, infectious respiratory diseases involving proximity contact network, protein structure prediction. Huge Chinese population, 500M Internet users, most using mobile phones, enormous potential for harnessing volunteer computing cycles. Chinese contributors are 90% male, mostly students in IT professions. Lots of swift growth, worth over $2.2 if the CPU time we were purchased from Amazon EC2.

—-

ForestWatchers

Tropical forests provide habitat for most of the world’s terrestrial plant and animal species, deforestation is 20% of greenhouse gas emissions. Project started at last year’s CCS. Collaboration with Brazil’s space research agency as they are leader in deforestation monitoring, responsible for detection system, trains monitoring teams, supports open data policies. Building PyBossa framework demo for micro-tasking, show an image to volunteers, let them examine it over time, ask if they think it is a deforestation area and if so they mark the area, then the experts evaluate to see if it is so. Short-term goal is having a working alpha by June.

Qs: relying only on experts? Will compare volunteer data to expert gold standard, evaluating their ability to do the task is a primary goal. —-

Chen – Quake catcher network @ Asia

ASGC is context for development – eScience collaborations, tech development, dissemination and training. Want to support/stimulate eScience collaborations, focus on earthquake sensing for disaster mitigation. Taiwan is in convergent plate boundary zone between Eurasia and Philippine Sea plates, lots of earthquakes. Earthquake simulations take a lot of computing power. Most of talk focuses on the seismic research. Want to use citizen cyberscience for educational initiatives, focused on classrooms, but mostly just volunteer computing.

—–

Ala-Mutka – e-Infrastructure and citizen science

EC program officer, describes focus of e-Infrastructure funding projects. One goal is seamless access, use, and reuse of data. Geant 2020 is European communication commons: deepen relationship between science and society, reinforce public confidence, promote science education, make scientific knowledge more accessible, informed citizen engagement. What can e-Infrastructure do for citizen science and vice versa? Data collections infrastructure, QA mechanisms, human computing resources, computing/science resources, scientific software and innovations development. To date they support a lot of open access efforts, tools and models for citizen science (Gloria – network of practitioners?, Discover the Cosmos), enhancing awareness.

Qs: ordinary guy running 2 citizen science projects – how can he get access to funding and HPC?

—–

Taddei – Citizens, Science, Education, and Technology: new synergies to be explored

IT is developing fast, open source hardware allows hacking cell phones to create new scientific instruments. Volume of scientific publication increasing on a log scale. Reinventing Discovery – Michael Nielsen – already outdated, how do you keep updated on the evolution of science?

Education is evolving slowly. Can we imagine innovative solutions? The rules are changing. Biology Letters – Blackawton bees, all the first authors are primary school participants, last author is a UCL scientist. Open source citizen science a focus – not just open source but also open wetware. 14yo developed first line earthquake detection for Chile. Games like Foldit not only advance science, but also methodologies and education. Learning through research – main difference between prior and future models is how knowledge production is catalyzed. Instead of being pushed by teachers for years, people can pick an interest and move it forward much more quickly and with engagement of society.

Scientific understanding progresses through feedback between experiments, analysis, and models. In many science contexts, students or citizen scientists only engage people in one part of the research – just crowdsourcing or crowd computing. Wants to develop a citizen science of citizen science. Open questions for citizen science: top-down, bottom-up, or co-constructed. Who will benefit? Science, companies, contributors, society? How to maximize learning through research while contributing to CCS? How to do experiments, analyze and model CCS? Necessary and sufficient conditions for advancing as identify field? What is optimal division of labor between man and machine? Citizens and professionals?

Chess of a metaphor for the future – Kasparov being beaten by computer, if your job is chess, get ready to find a new job. Game of Kasparov against the world, 15yo worked with software for collective intelligence and challenged Kasparov so much he never wanted to repeat that game. Chess is not a simple task, and collective intelligence can accomplish complex tasks, but it has to be managed well.

How can citizens collectively maximize the synergies between citizen cyber science, OSS, and empowering knowledge creation environments?

—–

Pintea – Community based monitoring of chimpanzees and forest habitats using ODK (Jane Goodall Institute)

Mission is protecting chimpanzees and their habitats. Community-centered philosophy, community manages own resources. TACARE project focuses on conservation with local communities, numerous related goals related to ownership of projects. Started from participatory mapping project in 2002, found need to incorporate indigenous knowledge in their planning around Gombe National Park. Led to village forest monitoring, valuable not only for providing input but also reporting. Trained monitors learned GPS, were assigned village government, they provide tools and small stipend. Collected 36K observations. GPS had some problems, limited data points captured, relies on GIS which also had issues.

In 2009, ODK + Google tools to build systems for forest monitoring, testing on multiple devices. Villagers store data on mobile phones, upload data when they visit town for other errands every couple of weeks. Very difficult to train people in the field until they used ODK, this was a huge improvement to be able to get people up and running on it quickly. Google Android ODK training in Tanzania brought together villagers, leaders, and developers. Incorporating the monitoring into ongoing JGI projects using ODK, multi-million dollar projects, reporting a variety of data points, training via train-the-trainer either onsite or online.

Results include detailed information around chimp habitats, not just chimp presence but other wildlife, snares and traps, threats to chimps. Suspected forest monitors knew about this before but didn’t have the tools to report the data, lets JGI ask new questions and refocus interventions.

Collaborators in REDD project in Tanzania collect more scientific data, and focuses not on village land but public land. Other examples from Uganda where chimps still survive in degraded habitats. Locals can collect data for REDD credits that give them incentives help preserve the habitats. —–

Parsons – Global Canopy Programme

Working in Guyana; previously did phenology work with Woodland Trust and project called Nature’s Calendar, which started with digitizing a box of paper records. Slider maps of horse chestnut fruit ripening and swallows arriving, some data go back to 1700s. Oak first leafing is now happening 2 weeks later than it was 60 years ago. Phenology used in reporting for IPCC. Phenology important to daily life. Also had a project called Ancient Tree Hunt, oaks have 900-year life cycle, 300 years each of growing, then resting, then dying; dying stage is when they are considered ancient. Have had over 75K ancient trees mapped, have trained verification volunteers. Each tree has its own web page, some have multiple photos over time, so getting phenological biography of the tree, or simply indicate that you have visited this tree, so they know how many people have visited them. Shows a tree from Henry VIII’s hunting grounds on an old map, and then current location on satellite map where it’s now in the middle of a city.

Now working Global Canopy Programme, MRV – monitoring, recording, and verification. Focus on community management of their resources, policy impact as partners, technology to help record data. Project in center of Guyana in the rainforest, lots of interest from policy makers. Habitats include mountains, rainforests, and savannahs. Recruited and paid 32 contributors for 10 days/month, village leader has to get involved or nothing happens, need total buy-in from local community. Capturing data on paper, on farms, food, river, wetlands, fish, hunting, building, logging, savannah, and social activity in villages. Introduced them to handheld devices, using a variety of smartphones. Trained project leaders, need rugged phones and tested them in difficult conditions. Youngsters really like it and already know Bluetooth, but older people have never seen these devices. One guy’s fingers were so dry from working in the field that the touch screen wouldn’t register his touch! But the older people really want to learn the technology.

Using ODK, download forms, collect data, upload it, analyze and validate it, map it. Simple interfaces but numerous steps. They leave the phones with villagers to assess the device usability and durability. Next goal is using their data collection for ground-truthing satellite data on deforestation and forest degradation. Part of community forest monitoring working group, putting together an experience blog of organizing. Needs help setting up satellite Internet access point.

Q: what do villagers do with data, or is it just for you? Collecting data that they can use for managing their own communities, this is why village leader was so enthusiastic about it. Issues of jealousy with who gets paid and gets to do the work? Monitors selected by village council through democratic system, some questions around who gets paid, but data types being collected were inspired by collaboration with the communities so very driven by community interests. Community monitors are being verified and selected locally, a lot of awareness in community that this is beneficial for managing the community so that is alleviating issues of jealousy.

—–

Lewis – Congo Basin Citizen Science

Heartbreaking situation in Congo Basin: deforestation, bushmeat trapping, abuse of local people. Local indigenous people don’t mind sharing resources but don’t want tombs of ancestors bulldozed, destruction of freshwater sources, loss of medicinal trees, and especially the trees that support a special delicacy caterpillar require – they taste like meaty prawns, have a high commercial value. Worked with hunter gatherers to find out what trees they most want to preserve. Set up ugly Excel spreadsheet, but there are problems with analysis and accuracy. Worked with a company to set up monitoring with icons developed with the locals, who don’t speak European languages and are not literate. Have been using these on military grade devices since 2006, takes 10 minutes to learn, and only poor-sighted elders have a hard time seeing the icons, but then they get partnerships between youth and elders to do the mapping. Screens show social groups – so they can indicate who cares – then is shown the kind of resource (e.g. forest spirits) and then they choose a subtype, it beeps, and they know data are recorded. Now they have records of sacred trees, caterpillar trees, medicinal trees, and cemetery sites, and the forestry company is now able to protect the locally valued trees. The method has spread like wildfire because it’s so effective. Allows peaceful communication and meeting all interests via maps, which predates writing by a long time. After forest company accepts trees for preservation, the locals go back and physically mark the trees to make sure they are not harmed accidentally. Lumber company now uses this approach for all projects. Have also done some modifications to evaluate illegal logging elsewhere. When they visited people, they showed icons and asked what they meant, refined icons with collaborative feedback, locals also provide feedback about other features they want or need. Iterative development with localized input.

Early project challenges involved consent so they spent a lot of up-front time explaining before asking for consent which is a process and not a contract. Worked to reduce issues with accessibility of information, training and support, self-definition of roles and resources, community protocol and ceremony that defined who would do what, including withdrawal of data and consent. Key challenges: time and technical support, developing long term sustainable data collection strategies, conflicts of interests among some participants, and difficulties ensuring effective advocacy due to being ignored, corruption, and inertia. Recent win: locals recognized value of these approaches, came to them and asked for software to track poachers so law enforcement can address issues more effectively. Initial gadgets robust but very expensive, need to find cheaper hardware options. Hackfest challenge is to design portable device that can meet specific requirements, e.g., accurate geo-ref under rainforest canopy, withstand heat and humidity, disguise its purpose, be able tolerate a week without charge and before upload, ability to update software quickly when needed, etc. Another challenge has to do with climate change and mining concessions, hacking industrial sensors so they can live in landscapes and record changes for long-term monitoring of changes, develop analytic tools for visualizing and analyzing results themselves, and building lobbying partnerships to build for action. Third challenge is developing intelligent maps: new analytic and visualization tools, experimenting with tablets for recording and visualizing.

Data collection devices now reality, but marginalized and poor remain sidelined. Analysis dominated by scientists, need to develop accessible analytic tools if citizen science is to reach potential. Methods for motivating and ensuring effective participation resolved for some user groups, but those most indeed of support still largely excluded, rural, semi/non-literate, women, urban poor.

—–

Haklay – Kickoff of ExCiteS

Began with noise monitoring, used for environmental justice. Concern over air quality, looked at ozone sampling using litmus-like sheets, dust sampling, and Leicester University collected leaves because you know where you got them and how long they were out, so you can find out how much copper is in the air because it’s magnetic. Also put up diffusion tubes for mapping the air quality.

What makes it extreme citizen science? Who can participate – everyone can participate, not just educated people with domain knowledge. Moving locations from populated rich parts of planet to everywhere. Moving people from just data collection and entry to shaping the problem and analyzing data. Levels of participation, from basic crowd sourcing (citizens as sensors), to citizens as interpreters, to participation in problem definition, to participation in entire process.

Goals of ExCiteS include development of theoretical and methodological frameworks, developing core technology platforms, usability of GIS and related technologies particularly for non experts, etc. Coming projects include adaptable suburbs to learn about how suburbs are changing; intelligent maps like those in the Congo, spatial mobile games for citizen science; Google Earth Tours and communicating geographical concepts; OSM and user-generated GIS data; EveryAware to enhance awareness through social information technologies using mobile technologies to collect, analyze, and visualize local environmental info, and analyze changes in behavior based on the info.