Trip Report: London Citizen Cyberscience Summit, Day 3

My final set of notes from the summit on 18 February are below. They only cover the morning talks as I spent the afternoon in discussions with other attendees. Apologies for typos or bad formatting – typing on an iPad leads to weird autocorrects or missing spaces, and Posterous – well, don’t get me started. I hope I have time to migrate out of it sometime soon.

Enjoy!

—–

Open Knowledge Foundation

Supporting reuse and remixing of data and content, permission is a major impediment to innovation. Artificially intelligent chemical software to extract data from software but can’t use it due to publisher licensing, technology is stalled due to antiquated IP. OKF is a call for people to gather around the meaning of openness and how to make knowledge open. Not a campaigning group but looking at how to create tools and infrastructure to get information out to as many people as possible. Example of malarial research, frustration that no one can read the literature for free. Many people don’t read the literature because they can’t afford to. This is relevant to medical, climate, and development contexts. Trying to change the culture so that it becomes the norm that people have the right to access to the scientific literature. Working on this through scientific tools, one tool is Open Bibliography, makes reference collections completely open – just the list of references. Making reference lists available is a valuable resource in itself. Emphasis on high quality research creation and software for infrastructure across disciplines. OKCon2012, datahub.org

—–

Cabell – Online Collaboration and Legal Concerns

Legal overview: global collaboration is confusing; local not global laws which means some are regional and others are enforceable in different areas. Default rules that exist around ownership and control of distribution of work, have to actively change these settings, usually in writing, they are inconsistent rules with limited interoperability, generally law provides ability to exclude but not engage, and prior rights may limit use of your own work.

IP basics – patents, very expensive, cost about $12K, has to be new, useful, and not obvious, which is a lower bar than it sounds like. What is protected is method or process. Newness leads to embargoes on publication, have to file before publishing or you lose rights, so this holds up dissemination of research. Rights are to make, use, sell unless blocked by existing patents. Patent ownership is to actual inventor; if you invent on the job, it’s subject to shop rights. Lasts for 20 years.

Trademark is about identifying source of goods or services, usually a registration process but not in US, have to show public recognition of brand and limited to class of goods and services. Term is as long as people recognize brand.

Trade secret – lasts as long as it is secret. No legal definition, NDA keeps secrets, no right to use information, but for preventing exposure.

Copyright protects original expression, not idea, not statements of bare fact. Eligibility criteria is low, but different – intellectual effort vs sweat of the brow. Annual contest, Buller-Litton, for worst possible writing you can produce, example of expression versus fact. Facts are free to use without attribution, therefore not copyrightable, but data aren’t necessarily limited to facts so beware of underlying rights which may bewildlydifferent depending on the type of work involved – e.g., a Db of photos has different rights than Db of numbers or CDs and songs. Not (always) true that data aren’t copyrightable. Collections of facts are not copyrightable, but collections of Xrays are. Rights are prevention of copying, distribution, derivatives, translations, display, public performance, related rights like moral rights including integrity (intact w/o change). Rights differ based on type of work, e.g., artistic versus literary which includes software, Dbs, texts. Dbs can be copyrighted as a compilation, collection s of elements which are not individually copyrightable. Copyright is automatic the moment the creator fixes it into tangible format. Contributing thought not the same as expression, so coauthors who write nothing have no right to copyright (watch out, PhD advisors!) Works for hire automatically belongs to employer if created as part of job duty. Universities have different policies in this regard, e.g., as to theses. Funding sources can impose ownership and publication restrictions, e.g., funders requiring deposit of data or outputs. Specially commissioned works, e.g., from freelance and consultants – wedding photographers own copyright, do not belong to commissioner unless agreement in writing. Government works – federal work is in PD in US, so no one owns it, Crown copyright in UK. Types of joint ownership – unless group of collaborators think of work as a single work, then they are not really joint authors; if they do, each author has right to sell the work. If only some authors consent to combined use, then it’s a compilation or collective work, only the combination or part that is newly created is owned by the compiler. Duration of copyright is very complicated! SGDR – parallel to copyright, subject to abuse.

Other issues to consider: privacy tightly restricted in UK but hardly protected in US. Limited piecemeal protection in Us. Discrete bits of info may not reveal an individual but a combination of sources can, which runs a risk when combining databases. Main question is where you operate, if you operate in US but take data from UK, you are subject to UK law.

Other related random acts: human subjects research, public sector info, species and environmental info acts, import/export acts e.g., software is an armament, child protection laws, national security, institutional and professional ethics.

Implications for citizen science: usually no legal entity for voluntary collaboration, so that means no centralized management or ownership can take control of IP, only a person or business can own something. Default settings may be inconsistent with community’s intended uses of works, individual contributors can make decisions without consulting whole, piecemeal and distributed rights. In addition, law treats collaborators as joint offenders, individuals not protected from liability or harm done bothers in collaboration, e.g., copyright infringement. So one member can beheld legally responsible for harm done by others in the collective. Best legal practices: know own rights; document each contribution as well as possible like with version tracking that helps ID author, location and date; where possible, formalize collaborative organization to simplify legal application; carefully specify collaboration rights.

Open sharing has lots of standard public licenses like OKF and GPL and CC. OKF has reference list that shows how open licenses are and what they apply to. Linked Open Data efforts being used to facilitate sharing. CC applicable in 75 jurisdictions. Recommend CC0 (public domain) for data, attribution becomes too difficult. Natural history observations are considered statements of fact and not copyrightable, but comments about them would be copyrightable.

—–

Plantin – radiation mapping in age of bad data

Post-Fukushima: initially no data, but then bad data. Worries about sensitivity and then mishandling, data produced by entities whose motivations could be questioned. Several radiation mapping mashups. Mapping radiation was a 4-step process: 1. scrape it directly from websites, but initially unstructured, read through source code. 2. Measure it, many people tried to do this, could be done by many different groups or organizations. 3. Aggregate it, e.g., with Pachube, which is platform for online aggregation and redistribution through API calls. 4. Map it. Examples of only official or only alternative data, but more interesting is them mashup using both sources. Also useful for verification. Focus on monitoring group, SafeCast, ad hoc group of engineers in Tokyo, hard to ow if it is science, not planning to intervene, only trying to provide data and trigger reflexivity. Not activists. Hackers but not hackers, tinkering in DIY way, but close to community, so crossing the dynamics of science, activism, hacking, community.

—–

Ishigaki – Radiationwatch.org

Making radiation data available to public. Created device that is housed in a candy box “Frisk”. Used it because they had no time or money for plastic injection molding. Hooks into smartphone, 4 color variations! (much laughter) Much better than 40-lb Geiger counter. Free iPhone app, Pocket Geiger, takes 5-10 minutes to analyze your data. Factory right outside of tsunami disaster area, but income went down, so their nonprofit organization creating many jobs for disaster recovery. Socially inclusive, 3 core members, 5 professionals (pedologist, Dutch DoD, Dutch NIST, NASA, Japanese CERN), 12 hackers, 10K+ users. User reports on FB group, radiation levels high in children’s park, drainpipes, very high inflight. Have millions of data points but now running into privacy problems. Cities creating monitoring posts for radiation, specialists going to create high accuracy devices, but need to know radiation levels in own homes as well. Hates this governmental model where citizens have no access to data.

Issues about inconsistent measurement by contributors, need metadata or it’s not usable. Not even units or measurements, but also environment in which measurements were made.

—–

Maisonneuve – Public analysis of satellite images

300K damaged buildings to assess (from earthquake?), difficult to do by professionals due to scale. Organizational issues, how to organize non-trained volunteers to enforce quality and analyze a large area, either remotely or in the physical world.

Parallel model, n volunteers monitoring the same area for inter-rater reliability. Another model is iterative, annotation and progressive improvement like Wikipedia. Experimented with these approaches in 3 maps. Types of errors, false negative and false positive. Parallel model is eduction of false detection rates, redundancy useless if at the individual level p=1, p=0.5, want only consensual results, doesn’t solve problems of omission, agreement on obvious buildings but not difficult ones. Sensitive to aggregation parameters. In iterative model, somewhat reversed, less omission of buildings so better area completeness. Sensitive to destruction of knowledge in a basic implementation (last=best), very sensitive to initial conditions, so first player is very important – maybe need experts on this part.

Skill is an issue, how many volunteers needed to reach a certain level of quality? At some point, you get to a point where you can add more people but there are problems of scale,quality canbe replaced by uantity. Issue of complementarity, aggregating results of the test contributors, individually not all that great but together you get much more value.

Second question about training volunteers, ongoing effort. Difficulty of task can be assessed according to agreement, easy tasks have high agreement but difficult ones have more spread. Last point is that crowd learning can happen through learning through others mistakes, can identify most common errors and use this density of errors to use information to educate people according to errors of previous contributor errors.

—–

Foster – Project Seahorse: advancing marine conservation

Based out of UBC, committed to sustainable marine ecosystems. Know what is wrong, but have to figure out how to fix it.

Why seahorses? They are really cool fish! Most people don’t know that they are fish, have a bunch of cool evolutionary features (horse heads, marsupial pouches, prehensile tails) only species where the male gets pregnant. Seahorses are like a panda, no one cares about mudflats or mangroves until you tell them that seahorses live there. So saving seahorses means saving their habitats.

Threats include overfishing and target catching by small scale fishers, majority are caught as bycatch by shrimp trawlers, discard thousands of tons a year. Like using bulldozer in a forest to catch a deer. Threats to seahorses are threats to oceans and other marine life.

Captured seahorses are traded internationally, especially for Chinese traditional medicine, curios, aquarium trade. They retain shape when dried, so they are interesting curios like seahorses with fish fins guard to them like wings clutching mini tequila bottles. Trade is large and global, 10M sesahorses around world in 80 countries, so it is one of biggest species trade problems. CITES regulates international species trade, all 46+ species are listed in Appendix 2 which means international trade is permitted but regulated, so have to prove sustainability. Seahorses are one of most important fishes on CITES, first fish listed, previously not considered a species for international regulation, immediately after they were listed several other fishes were added that are traded internationally.

Problem is lack of location-specific information about seahorses to help groups meet mandates for demonstrating sustainability, IUCN red list shows most of the species (28) are DD – data deficient – all 8 species are EN or VU, basically very threatened or endangered, chances are good this is true for other species.

Can’t spend her life diving to find seahorses around world for lots of reasons, most important is lack of time because we need to act now. Fortunately people are diving the world already, and sending them info about seahorses. Being done for other taxa already, but new for the ocean, few other projects focus on marine species. Best examples of citizen science are birds (eBird). Have done it so well, they have monitored conservation status of over 40K species. Challenge is addressing marine problems, wants to “give seahorses wings.”

Marine environment is extreme for monitoring, can’t get GPS, most electronics don’t work underwater in part due to pressure. SCUBA surveys overcome issue of location by tethering to floats on surface. Another problem is that a seahorse is easy to identify, but telling which species they are is very difficult. Wants to start a project that will be so sexy that every diver to give them data and feel they are making a difference for ocean stewardship. Critical because they won’t be able to enter data immediately. Maybe they make notes on dive record, but have to still want to enter it back at their hotel before getting into Coronas on the beach. Made progress here already! Trying to work with EpiCollect, someone else has offered to help with branding, but also needs protocol and building a toolkit for monitoring seahorse populations, train worldwide groups to assess these trends locally as partners. Needs help with feedback tools, beyond point maps and bar charts, like overlapping seahorse info with other marine data. If they can map where threats are, this helps communicate sense of urgency and conservation needs, and prioritize monitoring locations. Needs info on lessons learned and experiences. —–

Jones – iBats: using smartphones and citizen networks to globally monitor bats

Many indicators of recent declines in global biodiversity. Looking into smart monitoring, bats are a good indicator, a fifth of all animals, widespread and sensitive to global change (behaviors sensitive to temperature) and important ecosystem service providers. Cool animated radar graph of bat emergence in Texas, saving a third of the crop pesticide costs by eating up bugs. Bats also interesting because they emit radiation in the form of echolocation, using ultrasound to communicate and locate objects. Can sense bats based on this radiation leaking. They created database of bat acoustic biodiversity, wants to use to to classify bat species. If acoustic monitoring could be done with these identifiers, why do this and where? Combined index map showing areas where the current potential for using the tool is highest due to call similarity, e.g., very different calls in certain areas.

Have tested acoustic species classification tools, most call types can be identified at over 97% accuracy except one group of calls. So continental tools would be the ideal. Using ultrasonic microphone (very expensive, 400 GBP, trying to hack a cheaper version they think they can do for 10 GBP) plugged into smartphone headphone jack; need special microphone because high frequency sound requires high speed sampling. Have developed portals/versions in different locations and different languages. Started off in Romania (of course!) but effort has moved around the world.

From there, building distribution maps, using machine learning and creating hotspot maps to inform conservation policies. Cn do trendlines, to show bat populations as a headline indicator of ecological health. Latest project is also doing this for frogs and insects like crickets. Got Zooniverse funding for Bat Detectives so they’re currently working on noir branding of their project.