Andrea Wiggins

Academia and Other Adventures

Andrea Wiggins

Main menu

Skip to primary content
  • Home
  • About
  • CV
  • Publications

Monthly Archives: October 2012

Trip Report: iDigBio workshop on Public Participation in Digitization of Museum Specimens, Day 2

Posted on October 3, 2012 by Andrea
Reply

iDigBio workshop on Public Participation in Digitization of Museum Specimens, 9/29/2012, Gainesville, FL

——

Watson, Bill. Chief of Onsite Learning, Smithsonian Institution, Washington,
D.C.

The Smithsonian’s public engagement projects.

Smithsonian has daunting number of specimens, data, and other assets – existing and not yet collected – to digitize. They need help! Millions of people are hungry for relevant, exciting, and meaningful experiences, but main assets – collections and data – mostly lie dormant for public engagement.

What kinds of things do we do at natural history museums? Look at objects, hands-on exhibits to understand objects, classes and programs to learn more from scientists and others. Museum community agrees that they haven’t done a very good job of bringing research and data to the public. In the 21st century, still using 20th and 19th century approaches to public outreach.

Groundbreaking programs have still only scratched the surface. What do NMNH visitors want? Relevant, authentic, personalized, immersive, one of a kind, “awesome”. [personalized = autonomy] People enjoy experiences that provoke awe – dinosaurs and so on. Can’t think of better way to promote that than access to collections – people are interested in fieldwork and specimens.

Lots of excitement in natural history museums about bringing out data, science, research for engaging people. Thesis for conference on 21st century learning in NHM: NHM have unparalleled records of the natural world; used in very limited ways in science education and literacy; can’t do our work without inviting public to be part of the action, especially for digitization. The other side of “our work” is getting the public engaged and excited about what we do.

New possibilities: public use of digitized collections, public participation in digitizing collections. What would teachers need? Give access to the stuff and allow us to tailor to our/students’ needs. Built a way to display assets related to a taxon – example of ants – showed specimens, photos, digitized record cards, etc. Collections.si.edu – repository in collections browser, includes only objects, no digital assets like videos, archive records, etc. Still have to know what you’re looking for. Now building interfaces for public to access this without knowing what they’re looking for.

Part of 10K sq ft ed center building is digitizing 20K objects in the public collection. Providing a more intuitive interface, can add collections you find to your own “fieldbook” so you can have a self-curated collection. From digitization standpoint, using objects in a different way than before, opportunities for people to tag and submit their ideas to and about the objects. Way to test outcomes for people involved in adding tags, what does it mean for “real people” and “regular people” to have this experience? Maybe it inspires the thought that they can be a scientist too.

Smithsonian Wild – camera trapping project – more traditional cit sci project. People set up camera traps and capture data.

Checkpoint: is this public participation in digitizing collection? SI digitization strategic plan: “digitization…creates the potential for people the world over to add impressions, associations, and stories to the permanent record.” That democratizes digitization, not easy to think through logistically and philosophically but they’re trying. Conservative model of overlap between digitized collections and science literacy, underscores opportunity for public contribution and building their own experiences. More daring model: overlap between collection and PPSR. What’s a collections-based PPSR project?

Some of what people get out of PPSR is knowledge of research design and data interpretation, people start asking and answering their own questions. One of most expedient means for engaging people in science in a fun way, helps people see that their interests connect to science and science connects to their interests – connecting back to relevance and authenticity.

Collections-based projects are perfect entry-level PPSR for NHM. They bring in new audiences – untapped resources and new audiences for the field, like tech geeks, photographers, statisticians, who else? Models – untapped potential for unique, fun, huge, meaningful work; connect collections to real problems, connect digitization to those problems, digitization for democracy = groundwork for participation. Exciting new opportunities here, much to learn about and figure out how best to do this together.

——

Resources available to new projects that engage the public in science

—

Newman, Greg. CitSci.org Project, Natural Resource Ecology Laboratory, Colorado State Univ., Fort Collins, Colorado.

The power of many: many people, many programs, and a common goal.

See USGS presentation.

Where is there help? Brief overview of places for support in creating projects. Resources – citizenscience.org, scistarter.com, citizenscienceacademy.org. Best practices – citizenscience.org, caise.insci.org; wiatri.net/cbm. Tools/platforms – many mentioned yesterday. Related working groups – DataONE WG on PPSR, USGS CDI cit sci working group.

Citizen science is complicated!

——

Lessons learned while developing successful public engagement projects

—

Flemons, Paul. Team Lead, Atlas of Living Australia Biodiversity Volunteer Portal. Australian Museum, Sydney, Australia.

EMu’s in the cloud—ruminations on interesting interfaces, efficient workflows and building an infrastructure for crowdsourced digitising that is open and integrated.

Two-stage approach – use onsite volunteers to image specimen labels, online volunteers to transcribe the information on those labels [which is a structure I keep advocating…]

Stage 1: DigiVol – volunteers capture images and “partial records” – species name, image, catalog number. Stage 2: put those data online, complete the record and georeference. Goes back out to the atlas as a full validated record.

Stage 1 required getting a room dedicated to volunteer workstations for imaging. Online manuals and training videos. Recruitment through traditional museum networks, customized training, and coordination and supervision by 2 PT staff. Current onsite volunteers – 60-70 volunteers, about 12 at any given time, low dropout – people passionate about it; 2:1 female/male, wide range of ages. Output of 1.2 FTE staff + volunteers = value of 3 staff.

Barriers to having lab of volunteers do label digitization – database permissions, union considerations about staff jobs, etc. Why not use staff for direct digitization? Productivity increase, engaging public and important outcome/goal of project.

Lessons learned: initially mgt and collections staff were uncomfortable, unsupportive and hostile initially – big change for them, trust issues. Ideally have process managed and incorporated into mgt structure of collection. Change mgt process – small steps, address concerns consistently, regular F2F communication, inclusivity in developing training materials and process. Start w/ activities that are least controversial (easily handled groups of specimens that aren’t as easily broken); as relationships grow and staff becomes more comfortable, then begin moving into more controversial activities like more fragile groups.

Volunteers can be very dedicated and passionate; important to get balance right between ownership, sense of community, meaningfulness, and maintaining enough control over the process. Can improve engagement through sense of community – increasing understanding and appreciation of collections and associated science through tours of collections and talks by staff and scientists. Rewards and tokens of membership – haven’t tried – t-shirts, birthday cards, etc.

Crowdsourcing lessons learned: at face value, crowdsourcing transcription and georeferencing of collections seems insane when considering mismatched tasks and resources. Key is balance between what institutions want and what volunteers want – great lists of priorities for each group. Ways to achieve such balance – low-level gamification like expedition themes, contribution-based team roles, leaderboard; FB group (not that functional); regular emails (takes too much time). Wants forums [very effective way to get volunteers to support one another.] Need effective workflow for volunteers. Doing “challenges” can really boost participation; some volunteers crossover between onsite and online. Volunteers don’t tolerate errors or bugs in software; they bail if you don’t respond.

Data quality – georeferencing much harder than simple transcription; variable understanding.

—

Bonter, David. eBird and FeederWatch, Lab of Ornithology, Cornell Univ., Ithaca, New York.

Engaging the public in ornithological research: Lessons learned from 50 years of citizen science at Cornell.

Primarily responsible for Project Feederwatch, have a whole suite of projects going back to the 1960s, nearly all focused on birds, national in scope and increasingly global.

Contributors are not robots, offer lots of resources to increase their own knowledge. Just celebrated 25th anniversary of PFW, 1.8M checklists, >l52K count sites, 4.2M hours of volunteer time, 2 books, 26 scientific papers. Important to build around a protocol with structure to make data valuable. Repeated counts every weekend all winter of max number of each species with weather & effort data – very valuable scientifically due to repeat locations.

Using these data, can now track invasive species (Eurasion collared dove), species of concern (Evening Grosbeaks disappearing). Range expansion evident in feeder birds, linked to habitat and climate change. Research platform to ask new, unanticipated questions – realtime tracking of avian conjunctivitis. PFW peeps noticed it first, turned them onto the disease showing up, so they started tracking it. Participants are key – data collected by public, fully funded with participant fees, 70% annual retention rate. 30% of participants give more than the participation fee – $75K a year in excess of costs. Recruitment a challenge – even with high retention, have to recruit 3K-5K new people per year. Try everything for recruitment; demographics are women over 50 and highly educated – Martha Stewart crowd.

PFW participants get resources to support identification, region-specific posters, online & print instructions and multimedia production for videos that are getting a lot more traction; “tricky bird” identification tips. Also provide participant support: online field guides, participant forum, photo IDs – 1500 emails/day, 15-20 phone calls during season, and in forums some vols become experts and help support one another, which takes burden off staff.

Complex and evolving data validation system. Feedback is critical – people don’t put data into black holes forever, they want something out of it. All programs have explore data section to see past counts, all data and summaries completely open to the public. People love having their photos featured for rare birds – if they get featured, they have them for life! Lots of maps to query for what they’re interested in, top 25 feeder birds by state & province (media loves this), training graphs. Annual season summary that goes out to 50K people.

Global coverage is all well and good, but people care about their local areas. Partner with other groups to create portals for other orgs, people more likely to submit to local projects.

—

Newman, Sarah. Citizen Science Coordinator, NEON, Inc., Boulder, Colorado.

Lessons Learned from NEON’s Project BudBurst: a national citizen science program.

Main focus is education, want to engage public and make it accessible. Provide local focus through partnerships – hard to be relevant locally when you’re national. Primary goal – education/outreach, secondary – useful scientific data. Entirely online – don’t give talks to small groups very often. Data freely available to all (not the same as accessible), contributory category – data collection. Variety of ways that people can make observations.

Lots of great tidbits for lessons learned from yesterday – key observations from experience based on comments by speakers. Techniques that work well at national and local levels. Know your audience, different audiences for online and offline participation, different ages need different structures, etc. Start small – pilot, see how it works, test it with the audience, see what opportunities and needs come from participants. Word spreads and expansion happens organically, especially through partners. Pilot ideas – citizenscienceacademy.org.

Reduce barriers to participation, hopefully come through in pilot – things emerge that you did and didn’t think about, e.g. available technology, skills to use tech, language, terminology, web usability, time commitment, family resources and engagement, culture, make the work transparent and simple. Example of participant in Timbuktu who has a really slow dial-up connection. Raj Pandya’s article – family engagement key for some audiences such as underserved communities, a problem many cit sci projects haven’t fully tackled.

Retention, always a challenge. Must find middle ground, shared community and discussion forums seem to work – newsletters, meetings/conferences, social media (time intensive), tiered participation (level up!), attach research question to specimens being transcribed. Recognition of contributors – always important. Lots of ways to do it, different strategies for different audiences – certificates for kids, awards for outstanding achievements for adults, acknowledgments in article credits, massive list of contributors. Partnerships – TCNs already doing this; leverage those existing networks, adding on smaller communities. Examples of building on existing models, e.g. BudBurst at the Refuges replicated in BudBurst at the Gardens and BudBurst at the Parks – perhaps at museums. Partners are bread and butter for providing local relevance. Speak multiple languages – not just language translation, but must speak the language of the communities you reach out to and work with, e.g. formal educators, informal educators, kids, scientists, other languages and cultures.

Other techniques – gamification, targeted events, make it fun, blog with fun content of BudBurst roadshow – people have spun off their own versions of this. An idea for making transcription more exciting, e.g. transcription trekker. Challenges of foresight with technologies – be nimble.

—

Young, Alison. Citizen Science Educator, California Academy of Sciences, San Francisco, California.

Documenting California biodiversity with citizen scientists: lessons learned through a year of planning.

Cal Academy lessons from planning project. #1: Focus – defining citizen science at the Academy, specific goals, criteria, and guidelines for what they wanted to accomplish – what’s their brand? #2: Equal collaboration between education and research. Collaborate between these divisions directly, coordinator on both sides of the coin, core team with weekly meetings. #3: careful planning – don’t just launch a program; got a year-long planning grant from Bechtel foundation to figure out ways to answer real RQs about CA biodiversity, use historical collections, test cases, series of meetings to learn from others.

Each test case (2 of them, terrestrial and intertidal) had specific goals to test how well they can meet their goals, partner goals, and produce usable project. Volunteers recruited through existing networks, local colleges, meetup.com conservation photography group, word of mouth. Not currently wide open since it’s a test case, not ready to handle full load and didn’t want to start entirely from scratch. That really skewed participant demographics – still female, older, caucasian, 50% with grad degrees, 70% had collected data for other studies, etc.

Successes so far – 3-day events. Example of water district: had 650 observations, 350 species, documented 1/3 of watershed, more than 450 specimens and 80+ participants.

#4: don’t be afraid to get it wrong! Listen to volunteers and check data. Pouring rain on first day surfaced some problems right away. Smartphone thing didn’t catch with their vols, but other ways of participation worked just fine. Originally thought they could cover entire watershed in a year, realized it would take 3 years. Number one complaint on first day wasn’t rain, it was figuring out where to go and how far to stray from trail. Offered opportunity to work with herbarium specimens – no one volunteered, very different groups want to be out hiking versus detailed work mounting herbarium specimens.

#5: feedback, reinforcement, & appreciation. Rapid feedback reinforces role in bigger picture – data viz, results, emails, photos. Wanted responsive sophisticated system but couldn’t do it for a one-year project. Volunteers really liked email updates on progress and outcomes, even though they can’t really read map labels, they liked to see how much was accomplished in just 3 days. Showed them comparison of herbarium specimens to show value of modern data compared to sparsity of historical data. Appreciation: lots of gratitude messages, swag, unique opportunities like camping on Mt Tamalpais, appreciation events.

#6: Evaluate! Get detailed feedback – are we really increasing science literacy, do they understand why the project is being done, do they understand the process, etc? Regular questions like motivation – contribute to scientific research; curiosity about local environment, plants, animals; spend time outside; connect to/support colleagues; meet others with similar interests. How important to see analysis or results? 44% said very important, 0% said not important. People want to know how they are contributing to bigger picture.

#7: Learn from others. Don’t reinvent the wheel – 3 days of cit sci meetings. Invited practitioners, biodiversity researchers, conservation organizations, data managers, citizen scientists. Goals were learning from others’ experiences, discuss best practices, identify common goals, needs, logical next steps. Different topics each day with speakers and panels, spent every afternoon discussing what did and didn’t work in their own experiences – turned into 60-pg proceedings.

What’s next? Hoping 3-year grant next time, expanding current projects, create additional projects, focus on goals not fully addressed in case studies (tiered involvement, multiple entry points, mobile/digital media). Want to design strategy for digitizing CA specimens, especially those from survey locations, eventually for specimens, for research and engagement. Want to help connect people to the history of their place. Want to use specimens not only for research, but for further engaging and motivating participants. Ongoing evaluation – meeting goals, meeting participant needs? 2014 biodiversity exhibit, want to include citizen science component and get people engaged from the public floor of the Academy. Future: CA regional cit sci network, including science centers; don’t want to do this alone. Eventually national/international if they find a model that works really well.

—

Hill, Andrew. Vizzuality.

Stories in the data: lessons from developing citizen science applications at Vizzuality.

Work with Zooniverse, small company that started with a blog. Mission-driven company, that got them into citizen science, working with Citizen Science Alliance. Works on small set of their projects.

Things they have noticed… #1: rewards you can offer helps define what you can do – task complexity is related to that. Planet Hunters example: draw boxes around graphs – so how to motivate people to find planets in graphs? Discovery alone is major motivator, leads to coauthorship. Good press goes a long way. Worked with BBC Stargazing, in 48 hours, got over 1M classifications, went a long way.

#2: long tail curve of participation. Worked with NASA to markup underwater astronaut training images, lasted only about 1.5 weeks in conjunction with the mission of astronauts. Made a game of it, in 1 week got 450+ volunteers, 15K biodiversity observations. But found that a few people at the top did so much more than others, #1 volunteer did more than double the tasks of #3. NASA staff and Vizzuality staff couldn’t keep up with volunteers. Made it a race to the end, let people show others also marking up. Sometimes most dedicated volunteers are experts themselves.

#3: things we think will motivate users are NOT what motivates users. Example of Old Weather, digitizing ship logs. Some people motivated by science, others interested in genealogy and search for relatives thrown overboard, naval history buffs, normal history buffs (e.g. one interested in sports being discussed, movement of Spanish influenza), data buffs. Forums were really powerful, let people ask others if they’d seen relatives mentioned, allowed people to organize themselves around their own interests and expectations of what they would get out of it.

Where will we learn from next? Starting with Notes from Nature: expert interfaces, badging, user profiles. Also interested in other models, like EcoHack NYC, started 3 years ago. Idea is that people have more to contribute than data and analysis, great software engineers and hardware hackers who are interested in working on scientific problems, so they partnered with an org to make these opportunities available. Lets people think outside the box in ways scientists haven’t done about how to design interesting projects. Had ignite-style 5 minute presentations about from scientists about interests for doing tasks, then groups form organically around what they’re interested in, and got a lot done in a day.

Thing that is hardest about EcoHacks is getting scientists, please spread the word! Great opportunity for participants to learn what scientists are doing.

Posted in Trip Report | Tagged biodiversity, curation, digitization, iDigBio, information systems, preservation, public engagement, transcription, trip report, workshop | Leave a reply

Trip Report: iDigBio workshop on Public Participation in Digitization of Museum Specimens, Day 1

Posted on October 3, 2012 by Andrea
Reply

iDigBio workshop on Public Participation in Digitization of Museum Specimens, 9/28/12, Gainesville, FL

——

Introduction to NSF’s Advancing Digitization of Biodiversity Collections Program

Anne Maglia (U.S. National Science Foundation).

ADBC – challenge is to mobilize “dark data” in collections. Most data inaccessible, inconsistent, can’t be captured, etc. Feds get it, have been working on figuring it out for a few years now. NIBA (Network Integrated Biocollections Alliance) goal is centralized integration through research-based thematic networks. ADBC (Advancing Digitization of Biological Collections) is NSF program, 10 year, $10M/year – coordinating resource (iDigBio), TCNs (thematic collection networks). Lots of acronyms. Goals from the recommendations: understanding and appreciation of biodiversity through education and outreach, drive well-informed environmental and economic policies.

Updated view of NSF Broader Impacts: encompasses benefit to society and achievement of specific desired societal outcomes. New broader impacts should be achieved through the research itself, activities directly related to research, or activities that are supported by but complementary to the project. Meaningful assessment should be based on appropriate metrics, sometimes assessing effectiveness of activities is best done at a higher, more aggregated level, than the individual project – e.g. via a community.

Opportunity for engaging society in iDigBio – through products and processes, involvement in data capture, verification, and meta-analysis. Assessment at multiple levels can create model for community-wide impact.

—

Introduction to iDigBio by Larry Page (iDigBio; Florida Museum of Natural History, Univ. of Florida). 

The thematic collections networks: overview of project goals and digitization methods, with recognition of steps that could involve the public

ADBC intended to facilitate use of biodiversity data for scientific, environmental, and economic challenges. 7 TCNs are involving 130 institutions. Goal of iDigBio is enabling digitization of biodiversity collections data, with efficient/effective digitization standards and workflows. Digitization – specimen-based, label data (georeferenced), images, metadata and ancillary data. Activities include databasing, georeferencing (major activity), imaging (sometimes how databasing is being accomplished).

Data portal includes specimen search, linking collections to ecology, paleontology, genomics, living collections (zoos?), other repos. NSF requiring collaboration with iDigBio on collections-based projects. Doing tool development and integration, host workshops, convene working groups, visiting scholar program, education and outreach.

—

Nash, Thomas. Lichens and Bryophytes Thematic Collections Network Project. Univ. of Wisconsin, Madison, Wisconsin.

The lichen and bryophyte and climate change (LBCC) TCN: an overview, current progress and relationship to the American Bryological and Lichenological Society.

2.3 million specimens, 65 institutions, 1 year after TCN founded. Bryophytes and lichens dominate arctic and northern boreal regions, commonly in many other ecosystems, store a major part of world’s organic carbon. American Bryological and Lichenological Society. Specimens include 900K lichens, 1.4M bryophytes, 16 digitization centers working with 65 institutions (herbaria) that include 95% of non-governmental collections. Complex workflow for digitization includes OCR, NLP, geo-referencing, transcription, etc.

Imaging stations required camera stands, jewelry lighting boxes, black coverings; requirements for imaging resolution is 20px high for small letter “x”, camera connected to computer, adequate battery life. Using barcode reader as well. Setup took several months to establish. Imaged over 40K specimens during first 6 months with 2 undergrads. Imaging 50K-80K records/year is doable.

Metadata – barcode each specimen, latest species name (requires taxonomic knowledge), collector and collection number. Sometimes also a few other fields, e.g. major geographic region, not particularly detailed. Now have two portals, one for lichens and one for bryophytes. Portal allows management of collections and access to stuff, you have to be an expert to use them.

Transcription – extensive data forms to get labels transcribed. Crowdsourcing transcription – national coordinator and Missouri Botanical Garden, volunteer programs, members of ABLS. Sophisticated user and workflow management system in SYMBIOTA – transcription, geo-referencing, professional quality control. Sounds too complicated for average contributor.

—

Brinda, John. Lichens and Bryophytes Thematic Collections Network Project. Missouri Botanical Garden, St. Louis, Missouri.

Digitization of Bryophyte Labels at the Missouri Botanical Garden.

Institutions have individual collections representing different collectors, time periods, geographical regions, taxonomic groups. Challenges: Exsiccatae – duplicates; recognizing historically important collections; handwritten labels. Connecting labels to collections allows inference of contextual metadata.

Historically important records – no one without a PhD can recognize them. Wildly variable information on them, only people who know what to look for will be able to identify them. Wants to work with crowdsourcing for handwriting analysis, language translation, georeferencing, nomenclature.

—

Speelman, Julie A. InvertNet Thematic Collections Network Project. Purdue Univ., West Lafayette, Indiana.

Community assisted digital imaging of insect specimens.

InvertNet – entomology. Staff includes “systemicist.” Goals are digitizing over 50M specimens at 22 Midwestern collections plus Hawaii. Specimen images and metadata (label info) – specimens drawers, vials, and slides. Advanced imaging – including 3D – target goal of $0.10/image. Want everyone to be able to browse/search/view specimens through web interface. Developing tools for data mining and analysis; community building, collaboration & support; education, outreach, & reference.

Different workflows for slides, vials, drawers. Scanning for slides includes loading slides into tray, scanning, saving, uploading to InvertNet. Vials are more complicated – curate specimens (taxonomy outdated), remove labels, replace parts if needed, place on scanner tray, scan, save, upload. Think volunteers could help with all steps except curation. Drawer workflows – have a robot for scanning! Curate specimens, digitize image, upload metadata.

Hard to recruit/retain volunteers. What communities can they tap? High school students, organizations like Audubon, Master naturalist (“citizen science groups”), retirees (but many don’t like IT!), undergrads. Vol program: needs assessment, determine objectives, written proposal, volunteer coordinator staff support, job descriptions, recruitment & selection of volunteers, training and implementation, reward staff and volunteers. (missing: ongoing care and feeding of volunteers)

Expect that volunteers can be integral to digitization with potential for huge cost savings, but requires organization and coordination.

—

Seltmann, Katja. Tri-Trophic Thematic Collections Network Project. American Museum of Natural History, New York, NY.

Plants, Herbivores, and Parasitoids: A Model System for the study of Tri-Trophic Associations

Goal is digitizing 3.5M specimens – transform data on specimen labels and get records georeferenced. Specimens mostly insect-related, 30+ institutions across US. Working with volunteers and paid interns, lots of experience with this across collaborating institutions. A lot of the work requires being onsite, workflow for bugs includes multiple specific steps organizing of specimens, identifying sex and exemplars, barcoding.

Volunteers can be managed to include easy entry level work with minimal supervision, identify the skills and step them up in the process with more autonomy and responsibility. One-day volunteers could do work like cutting apart labels. If they want to have more involvement, they can do more interesting stuff. Most of AMNH’s volunteers are recruited through social media incl. “dorkbot”, radio interviews, etc.

Potential for mutual benefit with other TCNs – software development, crowdsourcing georeferencing and transcribing label data from images. Outreach in terms of professional participation – symposium at Entomological Society of America, specimen-level data information management course at AMNH (opportunity for reusing DataONE materials and customizing them), workshop using collection level data in research – hands-on working through a small project.

Training participants takes 1-2 hours to start, then constant supervision and trying to get them chitter-chattering with one another through online chat to relieve supervisor burden. Need to start with listing volunteer programs at TCN groups.

—

Thiers, Barbara. Macrofungi Thematic Collections Network Project. New York Botanical Garden, New York, New York. The

Macrofungi Collection Consortium TCN and North American mycophiles: enhancing a long-standing relationship.

Macrofungi are mushrooms and related stuff – used for food, pharma, recreation, forest health, products, etc. Need to digitize specimen data, fieldnotes, photos – 700K specimen records, 70K specimen images, many ancillary data that need to be linked to specimen records. Working with MyCoPortal for searching and e-publications, can host a lot of different types of info about the organisms.

Amateur mycologists – mycophiles, help with data editing and adding content to portal. Hope they will use data from portal for their own education and biodiversity documentation projects – offer opportunity to share their work. Crowdsourcers for help with transcription of specimen labels, opportunity to learn more about fungi and natural history collections. Objectives for public participation – develop a corps of expert volunteers for specimen record digitization, outlet for publication of info gathered by public, build closer relationships for mutual benefit.

Amateur Mycology Community – 72 nationwide clubs, often do field outings, documenting fungal biota through observations and collections, educate general public, communicate through meetings and publish. “Fungus Fairs” involve collecting a bunch of material to show grand displays to interested public.

Relationship between amateur and professional communities: pros serve as lecturers, identifiers at club events, publish field guides for amateur use. They all share info through Mushroom Observer – brilliant use of social media to make pro-am connections. Amateurs make and maintain collections.

Objectives of crowdsourcing: opportunity for participation beyond amateur mycologists, opportunity to expand these groups beyond gray-hairs. Hope to help MaCC project meet and exceed promised deliverables, sustain regional herbaria, improve science literacy and appreciation of value of collections. Challenges: volunteers not interest in participating, participate but don’t feel appreciated; mission creep – enthusiasm feeds ambition for larger initiatives that eclipse main objectives.

Digitization projects never go away. In progress of introducing communities and incorporating amateur content into portal; next steps include implementing crowdsourcing component and preparing guidelines and best practices for incorporating crowdsourcer feedback into collections records.

—

Sweeney, Patrick. New England Vascular Plants Thematic Collections Network Project. Yale Univ., New Haven, Connecticut.

Mobilizing New England vascular plant data to track environmental change: an overview and preliminary thoughts on engaging the public.

Volunteer pool – regional and state level botanical clubs and societies. Rationale for TCN: goal is providing data to support study of consequences of climate change and land use history in New England. Themes are climate change – plant phenology – and land use history – herbarium specimens, habitat data for subset of target taxa, developing vocabularies for both phenology and habitat data.

Developing organizational network to support these activities. Workflow plan: collection preparation (pre-capture), primary digitization, data enhancement (secondary digitization). Precapture involves developing labels and barcodes to associate with folders; need some “special” volunteers to work with specimens directly, but that’s not impossible – still not a place for huge numbers of volunteers.

Primary digitization is image capture with barcoding and subset of label info and additional details. Currently testing a high throughput digitization apparatus in collaboration with an industrial engineer for conveyor belt system to mediate process. Can use voice recognition software to improve process.

Secondary digitization: georeferencing, town level probably adequate. May not need volunteers to do this for New England due to density of towns and size of states. Mobilization – all images and data available through Symbiota portal. Training and outreach – undergrad and grad students (paid) and interns. Plan to establish network of citizen science observers across New England for phenology data collection; Primack working with this. Key issues with public participation – recruitment, management, training, turn-over, quality control.

—

Basham, Melody. Southwest Collections of Anthropods Network Thematic Collections Network Project. Arizona State Univ., Phoenix, Arizona.

SCAN survey results: engaging the public with insect digitization workflows.

SCAN – 10 institutions. Anthropods – ground-dwelling insects like beetles, as opposed to butterflies. Plans to develop strategy and sustainable model to allow for more specimens to be entered into database, increase rates of identification, adopt and encourage broader virtual collaboration. Two websites include informative site and a Symbiota portal.

Survey of 10 respondents from SCAN community – Challenges for engaging volunteers in insect digitization – meaning/purpose, task limitations, QC/training, need for verification, skills/temperament for task. What would be easiest to engage public: data entry, imaging. Most difficult: taxonomy clearest challenge. Most potential for integration as a citizen science project: data entry & imaging. Most viewed it important to do crowdsourcing/citizen science. To what extent should it have meaningful significance – less agreement about importance. Specific groups that would be valuable – retired systematicists, taxonomists, mostly retired professionals.

Interesting comments from open response items – capturing specimen label images; make participation fun/meaningful; separate database or class of data for cit sci data; digitization workflow separate from citizen science.

Results to focus on: task should be purposeful and meaningful; data entry & imaging easiest to integrate people, most viewed citizen science as important component of public involvement; most groups not currently engaging the public; need to make insect labels accessible and user friendly.

Looking at potential for mobile app – iphoneographers – taking macro images that could be contributed to SCAN collections.

—

Hendricks, Jonathan. Paleoniches Thematic Collections Network Project. San Jose State Univ., San Jose, California.

Digital atlases of fossil collections: new resources for the public to identify and understand ancient biodiversity.

Goals are databasing several major invertebrate fossil collections, georeferencing to enable study of biogeographic patterns over time, generate digital atlases of fossil life for general public and scientific community – digital images of ancient biodiversity, paleogeographic maps for individual species at multiple time intervals.

Interested in 3 different time periods from phanerozoic eon: neogene, Pennsylvanian, ordovician. Each era of specimens comes from several different collections. Digital atlas goals – field guides to the past for several fossil-rich regions, with online webpages, mobile app. Intended audiences are scientists and avocational fossil collectors. These resources currently don’t exist. Hope to include over 800 species in the atlases.

—

Martin, Elizabeth. Core Science Analytics and Synthesis Program, U.S. Geological Survey, Gainesville, FL.

Biodiversity Information Serving Our Nation (BISON): a national resource for species occurrence data.

National unified resource for discovery, linkage, and reuse of species occurrence data. Goals: develop large integrated data store of fully indexed species occurrence records for the US; incorporate federal and non-federal datasets of observations and specimens; develop resource capabilities tailored to US needs. BISON will serve as repo for digitized federal collections data, biodiversity hub of EcoINFORMA. Staff participate in IWGSC – Interagency Working Group on Scientific Collections.

BISON will include: powerful, flexible data search and filter capabilities; easy donwload of data; GIS and data viz component w/ high res base layers for US; easily spun off web & map services, widgets, templates, stats for partners & users; social science component; citizen science component.

Data: over 106M records, mostly from GBIF. Initial new data input emphases – federal data sets, invasive species data. New data to be added this year – amphibians, birds, fish, invasive species. Can download data and documentation.

Cit Sci observation platform – mobile devices and social media for recording and delivering data, based on curated Twitter submissions and Twitter data mining/stream API – funded by USGS CDI. Demo projects include USGSted, DC/Baltimore Cricket Crawl with DiscoverLife. Hawaii Bee Bowl surveys – K-12 students for bee collection, specimens sent to Patuxent for IDs, some donated to museums, data integrated in USGS Native Bee Inventory DB and will be integrated into BISON.

——

Biodiversity collections software tools: primary purpose and unique contribution of each tool, as well as functionality that could involve members of the public

—

Nelson Rios

Georeferencing in FishNet 2

Global network of fish collections, 48 data providers, 2M jars of fish. GeoLocate – software and services for georeferencing of biodiversity data. Performs well in US with automatic identification – 95% of locations found within 6km. Outside the US, could find most localities in Australia but huge distance off w/ high standard error, even after refinement.

Collaborative georeferencing is the next step – take advantages of similarities across collections, distribute workloads appropriately. Working to build georeferencing communities, create data sources, add new users easily. Task assignment, e.g. assigning records from African regions to known experts.

—

Beach, Jim. Biodiversity Institute, Univ. of Kansas, Lawrence, Kansas.

Specify & Lifemapper: breaking away from narcissistic science.

Specify: bio collections data management platform, modular for plug-ins. Well funded by NSF, pretty good staffing. Represents “all natural history disciplines”, 15% annual adoption rate with 435 collections in 29 countries and 247 institutions using the tool. Several related applications, including a version without MySQL installation, others: Schema Mapper, collection wizard to define new databases, iReport for designing labels and reports, Scatter Gather Reconcile to find dupes in GBIF. Under development: thin client, portal upgrade pipeline with Specify/Symbiota/FilteredPush, image management plugin, specify insight – mobile platform for “consuming activities.”

Lifemapper project: copy of GBIF with web services for geospatial data, computing niche models, presence-absence matrices for biodiversity pattern analysis. Emphasis on researcher workflow, tools, and metadata archives. Several related grants for further development. Had BOINC style screensavers for awhile with similar outcomes in terms of competing sysadmins.

—

Gilbert, Edward. Symbiota Software Project.

Symbiota: using specimen data to support community inventories

FLOSS biodiversity portals – specimen search engine, allows creation of biotic inventories, e.g. species checklists and BioBlitz surveys. Also includes ID key, image library, distribution maps, descrips, taxonomic info. Specimen-based model – collections are central partners, focus on scientific integrity and other priority features. Is actually a CMS for specimen data management, includes stuff like specimen processing data entry form that displays image to be processed, the OCR results, etc.

SEINet – plants of the southwest with flora inventory projects. Multiple types of biotic inventories, including student lists, native plant societies, and personal checklists. Can do personal checklist management, create own checklists (public or private), become editor for other checklists. Being used for several community projects in AZ. Challenges to date include correct identifications, misspellings, and coordinating volunteers.

Specimens are considered central – backbone of biodiversity research with vouchers and verification, etc. Need to “prove” that something was where you claimed it was. Voucher conflicts – expert review by herbarium staff, visiting taxonomists, exchanges; annotations; identification changes; checklist vouchers; ID conflicts. Can manage personal specimen collections prior to herbarium submission, can handle both specimen and observation data; functionality includes data entry, data management, label printing, cloud management with browser-based tool with no special software. Linked records in voucher network between original observation and physical specimens.

—

Giddens, Michael. SilverBiology Software Project.

HelpingScience.org label processing method.

7-step workflow for identifying and assembling label data for herbarium specimens. Step 1: click-and-drag image annotation for labels – can get 300 labels/hr/person. Step 2: OCR with Evernote, costs about $0.001 per label, includes NLP. Step 3: ID words and associate them with Darwin Core fields for basic details. Step 4: lexical grouping and analysis – compare words to OCR values and if distinct, assign to lexical set, then send image to data entry. Bulk validation step based on value similarity of related images. Software gets better at seeing variations with time due to training. Data entry through multiple interfaces, users get virtual tokens to use in the store for correct words. Visual choices offered for lat-long to pick right format. Fields then have to be verified – some by computer, some by volunteers.

Once the data are received, it can be exported in CSV, RESTful services for export into other software, Darwin Core Archive, others on request. Also lets you filter by DarwinCore fields. Sustainability is a major consideration – symbiotic relationships with fee-for-service, so volunteers receive tokens to spend at HS store – not paid to people who contribute, that causes problems. Instead, people can direct funding to fundraisers like micro loans to botany undergrads for research, sponsorships for students to attend conferences, K-12 equipment funding for science departments. Also allows donations to charitable orgs and funding small herbaria digitization.

—

Denslow, Michael. Appalachian State Univ.

Notes from Nature: a scalable citizen science platform for transcribing records from natural history collections.

Challenges: natural history collection data not used to full potential, and only about 1/3 is digitized. Lots of heterogeneity. Want to promote public engagement, most people know nothing about this stuff, and want to use specimens in research. Solutions for transcription – success in other domains, e.g. Zooniverse.

Definition of citizen science – contribution of data, analyses, or solutions toward scientific research by volunteers – not a new thing. Not just a way to gather data, but mutually beneficial partnership. Trying to engage people in a new way using technology. Existing efforts for transcription of natural history collections – herbaria@home for UK herbaria, Atlas of Living Australia portal. Excellent models to build upon, but want to create generalized solution that works at high volume and is scalable.

Phase 1 of development: proposal to Citizen Science Alliance, who asked several proposers to work together on similar project. No money, but some software development time. Initial goals: transcription interface prototype for simple, direct interaction with specimens; address complexities of multiple collections; plan for recruiting new pool of volunteers. Progress so far – private beta, expect a public beta prototype in November 2012. Currently focusing on project needs for SERNEC, CalBug, Natural History Museum London. More of an interactive task, trying to improve on transcription tool and find ways to engage new volunteers.

Phase 2 ambitions: proposed innovations – accuracy assessment, user engagement, OCR integration, scalable solution (and more). Model based on multiple transcriptions with different volunteers repeating the task for accuracy. When there is low agreement, report the field as needing further attention. [what is rate of flagging?] But data is pretty complex, low likelihood of identical entry for longer text strings. Another goal is more engagement – badging system, inline tutorials, advanced interfaces, etc. Create/select downloadable curricular materials based on grade, locations, etc.

OCR integration – two strategies, in the wild or word spotting. Each has advantages. Workflow includes machine readable parts and human-in-the-loop approaches. Also want OCR web API. Scalability – want to set up partner portals to engage people in missions to complete tasks, so new content can be entered into the system easily.

Want to make sure this works in parallel with other activities going on, e.g. iDigBio, TCNs, other OCR efforts like SALIX.

—

Best, Jason. Botanical Research Institute of Texas, Fort Worth, Texas.

The Apiary Project: a workflow for herbarium specimen digitization.

Doesn’t really have much to do with apiaries! Funded in part by IMLS. BRIT has 312 active volunteers this year with over 10K hours in 2012, about 15 actively involved in digitization. Goal is transcribing data into structured format, bringing together people and machines to leverage best abilities of each. Currently have 3700 specimens in the workflow, beta launching Apiary Lite.

15 minute intro video for training, other training for various workflows at different levels of complexity. Main workflow is analyze specimen, transcribe text, and parse text into fields (using keyboard shortcuts). Uses on-screen markup to highlight text matched to fields. Verbatim parsing versus inferred. Currently working with on-site interns and volunteers, would like to involve people more ad hoc but it would likely require a different approach than the current interface.

Main concern is how to keep people engaged. Takes about 5 minutes to fully transcribe all labels for one specimen, without any OCR.

—

Flemons, Paul. Team Lead, Atlas of Living Australia Biodiversity Volunteer Portal. Australian Museum, Sydney, Australia.

Atlas of Living Australia’s Volunteer Portal: open model for crowdsourced capture of biodiversity information.

Portal concept: open scalable, distributed, standards (DwC) compliant, browser based, asynchronous application for crowdsourcing capture, and enabling the digital repatriation, of biodiversity data. Supports: template-based creation and management of transcription expeditions, 3 levels of permissions-based activity, including transcription, validation, and administration. Tutorials for getting started in each expedition.

Virtual expeditions – theme-based tasks. Includes leaderboard, expedition stats of number of tasks, volunteer transcribers, level of completion, progress bars for each expedition. [when you show completion info, do more complete expeditions attract more attention – accumulative advantage?] Roles in expeditions are similar to Old Weather.

Templates include field notes, issue is that they have to generate a new template for each different type that comes up. Existing templates are reusable, but new ones require ad hoc development. Originally wanted to make the template wizard-based for selecting fields and laying out templates, but not enough resources to do that. Showed some interfaces.

——

Engaging the public in science

—

Wiggins, Andrea. DataONE, Univ. of New Mexico, and Cornell Lab of Ornithology, Cornell Univ.

Citizen science phenotypes: typologies and implications of project design.

—

Zelt, Jessica. North American Bird Phenology Program, U.S. Geological Survey, Laurel, Maryland.

How to successfully engage the public in science.

See notes from USGS workshop.

—

Wilson, Nathan. Director of Biodiversity Informatics, Encyclopedia of Life, Marine Biological Laboratory, Woods Hole, Massachusetts.

Mushroom Observer and the Role of Observers

Created Mushroom Observer for himself – software professional and naturalist by avocation, brought him his dream job. Focus is Western US mushrooms, 3500 observers in the last 6 years, 100K observations, 250K photos. Scratch your own itch, start with an existing crowd – OSS concepts apply more broadly. “If you treat them as your most important asset, they will return the favor by becoming your most important aspect.”

Engaging the public: Embrace laziness, accept garbage, deal gently with conflicts, avoid anonymity and privacy – don’t let people hide, make it easy to become an expert. Important to evolution of Mushroom Observer: start off with licensing and data reuse policies in place, automatic data sharing. Noncommercial CC licenses not accepted by Wikipedia! Offers people reuse options for licenses, both NC and not.

Issues with observations: didn’t keep herbarium specimen, best guess ID may not be accurate but best on current knowledge. Most people don’t collect herbarium specimens. Rule of thumb: anything that takes some work loses 90% of the population – clicking one button will cause that level of drop out. Thought very few Mushroom Observer users would not have specimens, it’s a lot of extra work. Turned out 15-20% of Mushroom Observer data had herbarium specimens. 28% had made herbarium specimens – truly amazing. Next steps – more collaboration with professional herbariums; validate and assess the numbers.

Another goal – improving IDs in the system – 28% of observations above species level, but 65% of those are below “expert” confidence. 56% have no notes at all, average note length is 179 characters. Don’t really know a whole lot about fungi, actually, and people have recognized new taxa that have not yet been described – organism from CA with 50 observations, can’t figure out what it is – they’ve ruled out all possibilities. Needs better review process, documenting of observations [do you tell people what notes are useful?], computable descriptions, standardization of “provisional” names (one was named “Carl”).

What is an observation? You see a specimen, not a species! Just the facts; observed features – macroscopic, microscopic, molecular. Concept/definition of a species are somewhat divergent – type specimen is the definition of the species. But there are lookalikes, cryptic species, paraphyletic & polyphyletic groups, convergent evolution, and changing circumscriptions. Diagram connection people to observations, observations to circumscriptions and barcodes, magical jump to scientific names; barcodes go to types to species names; circumscriptions to scientific names. Maybe there’s a way to do something in between the observations and scientific names to keep the distinction of an observation separate from their species label.

Needs: names for shared observational experiences; peer reviewed, distinct, unique, and memorable. Computable definitions – duck typing (looks like a duck, quacks like a duck), semantic web technology. Connections to traditional scientific names – moving target.

Posted in Trip Report | Tagged biodiversity, citizen science, curation, digitization, ecoinformatics, iDigBio, information systems, preservation, public engagement, transcription, workshop | Leave a reply

Recent Posts

  • Moving to…Nebraska!
  • Citizen Science: Beyond the Laboratory @ 4S/EASST 2016
  • Citizen Science & Health Data Donation: Health Data Exploration Project 2016
  • Citizen Science at AGU 2015 Fall Meeting

Categories

  • Academia (9)
  • Citizen Science (8)
  • How To (5)
  • Methods (5)
  • Tools (4)
  • Trip Report (19)
  • Uncategorized (1)

Archives

  • January 2017 (1)
  • September 2016 (1)
  • May 2016 (1)
  • January 2016 (1)
  • November 2015 (2)
  • August 2015 (1)
  • February 2015 (1)
  • December 2014 (1)
  • October 2014 (1)
  • July 2013 (1)
  • March 2013 (4)
  • January 2013 (1)
  • December 2012 (1)
  • October 2012 (2)
  • September 2012 (2)
  • August 2012 (4)
  • February 2012 (3)
  • January 2012 (1)
  • December 2011 (1)
  • July 2011 (1)
  • June 2011 (2)
  • May 2011 (1)
  • April 2011 (1)
  • March 2011 (1)
Proudly powered by WordPress