Citizen Science Data Quality is a Design Problem

I’ve been giving talks for years that boil down to, “Hey citizen science organizers, it’s up to you to design things so your volunteers can give you good data.” I genuinely believe that most data quality issues in citizen science are either 1) mismatched research question and methodology, or 2) design problems. In either case, the onus should fall on the researcher to know when citizen science is not the right approach or to design the project so that participants can succeed in contributing good data.

So it’s disheartening to see a headline like this in my Google alerts: Study: Citizen scientist data collection increases risk of error.

Well. I can only access the abstract for the article, but in my opinion, the framing of the results is all wrong. I think that the findings may contribute a useful summary–albeit veiled–of the improvements to data quality that can be achieved through successive refinements of the study design. If you looked at it that way, the paper would say what others have: “after tweaking things so that normal people could successfully follow procedures, we got good data.” But that’s not particularly sensational, is it?

Instead, the news report makes it sound like citizen science data is bad data. Not so, I say! Bad citizen science project design makes for bad citizen science data. Obviously. (So I was really excited to see this other headline recently: Designing a Citizen Science and Crowdsourcing Toolkit for the Federal Government!)

The framing suggests that the authors, like most scientists and by extension most reviewers, probably aren’t very familiar with how most citizen science actually works. This is also completely understandable. We don’t yet have much in the way of empirical literature warning of the perils, pitfalls, and sure-fire shortcuts to success in citizen science. I suspect a few specific issues probably led to the unfortunate framing of the findings.

The wrong demographic: an intrinsically-motivated volunteer base is typically more attentive and careful in their work. The authors saw this in better results from students in thematically aligned science classes than general science classes. The usual self-selection that occurs in most citizen science projects that draw upon volunteers from the general public might have yielded even better results. My take-away: high school students are a special participant population. They are not intrinsically-motivated volunteers, so they must be managed differently.

The wrong trainers and/or training requirements: one of the results was that university researchers were the best trainers for data quality. That suggests that the bar was too high to begin with, because train-the-trainer works well in many citizen science projects. My take-away: if you can’t successfully train the trainer, your procedures are probably too complicated to succeed at any scale beyond a small closely-supervised group.

The wrong tasks: students struggled to find and mark the right plots; they also had lower accuracy in more biodiverse areas. There are at least four problems here.

  1. Geolocation and plot-making are special skills. No one should be surprised that students had a hard time with those tasks. As discussed in gory detail in my dissertation, marking plots is a much smarter approach;  using distinctive landmarks like trail junctions is also reasonable.
  2. Species identification is hard. Some people are spectacularly good at it, but only because they have devoted substantial time and attention to a taxon of interest. Most people have limited skills and interest in species identification, and therefore probably won’t get enough practice to retain any details of what they learned.
  3. There was no mention of the information resources the students were provided, which would also be very important to successful task completion.
  4. To make this task even harder, it appears to be a landscape survey in which every species in the plot is recorded. That means that species identification is an extra-high-uncertainty task; the more uncertainty you allow, the more ways you’re enabling participants to screw up.

On top of species identification, the students took measurements, and there was naturally some variation in accuracy there too. There are a lot of ways the project could have supported data quality, but I didn’t see enough detail to assess how well they did. My take-away: citizen science project design usually requires piloting several iterations of the procedures. If there’s an existing protocol that you can adopt or adapt, don’t start from scratch!

To sum it up, the citizen science project described here looks like a pretty normal start-up, despite the slightly sensational framing of the news article. Although one of the authors inaccurately claims that no one is keeping an eye on data quality (pshah!), the results are not all that surprising given some project design issues, and most citizen science projects are explicitly structured to overcome such problems. For the sharp-eyed reader, the same old message shines through: when we design it right, we can generate good data.

Award-Winning Poster Design

Since winning a best poster award at the 2011 iConference, I’ve had several requests for guidance on how to design a good poster. It should be obvious that the content comes first, and that’s something with which I can’t help. Assuming the content is solid and the poster abstract is accepted, the poster design process that I’ve been using lately seems to yield better results in less time – at least, compared to the way I used to do it! I believe this process would be useful for many people, so I’m sharing it because part of managing your workload effectively is developing strategies for handling complex tasks that can take an inordinate amount of time.

As far as tools go, I recommend using what works for you. Some people use Adobe products like InDesign, Illustrator, or Photoshop; of these, InDesign is actually the best suited for the purpose. Unless you have extra time to learn your way around the software, however, use what’s familiar. I’ve heard that a lot of people design posters in Powerpoint. I personally would never do this, but I completely understand why others might.

I used to use InDesign, but it became much too difficult to use after CS2. My tool of choice lately is OmniGraffle: it provides a lot of the same awesome layout capabilities of fancier tools but is easier to use, has a great interface, fantastic diagramming functionality, and access to a wide variety of stencils from Graffletopia, which comes in handy when creating posters. No matter what you use, set up a 1/2″ border around the edge since large format printers can’t print all the way to the edge, and save your custom size document as a template (e.g., 36×48-poster-template.ppt) to make it easier next time.

My colleague Jaime Snyder gave a presentation on poster design last semester in our PhD seminar. This process is based on her tutorial and recommendations. Conceptually, it’s based around congruence of content hierarchy and visual hierarchy, which is to say, the focus of the poster content should also be the focus of the visual design.

This approach makes the design more cohesive and it can be faster as well. I’ve included time estimates for approximately how much time I spend on each step. It maxes out at about 10 hours for a poster, which might sound like a lot, but that’s actually a lot less time than I used to spend! If you tend to take too much time on any of these details, set a timer – you don’t need to spend all week to turn out a great poster.

1. Decide upon the ONE point you want to communicate with the poster.  (1 minute + 10 minutes)

What’s the essence of the work? This is the main thing to highlight with layout. Nothing else should overshadow it. Come on, you know this!

1.a. Make a short list of visual props that help make this point, like graphics, diagrams, example data, etc. I try to use images from my fieldwork whenever possible, and I do my own stock photography a lot of the time, mostly because I like to, I have the skills, and I get exactly what I want. You can get images from Flickr (my favorite source for stock) and many other sources, but be a good soul and make sure they’re CC-licensed.

For print quality purposes, if you’re using screenshots, resize from 72 dpi to between 200-300 dpi, but make sure the total file size does not exceed the original file size. The same goes for resizing any other images. You can take bits out, but you can’t put more in without losing quality. (5 minutes)

1.b. Make a short list of the other details needed to link it all together – a few content items I always include are listed below. (5 minutes)

2. Pick a font family, serif OR sans serif – usually one shouldn’t mix them. No diagonal text. (10 minutes)

Font families are much easier than choosing separate fonts for each purpose, and they look better. They match perfectly, and provide a wide variety of expressiveness along with cohesion. If you care about these things, the most vehemently hated font seems to be Comic Sans, so you might want to avoid that one. Choose text sizes and styles from your selected font family for: title, headers, subheaders, body text, graphs/charts, captions. You might not need all of them.

Use no more than two (maybe three) text colors; only one or two of the font styles should be in an accent color. I usually use my accent colors in the title and headers or subheaders, but never for the body text. Minimum font size should be 36 pt – and I usually have to readjust my font sizes upward when I actually lay it out digitally, so usually that equates to a series of font styles at sizes 96, 72, 54, 48, and 36.

3. Pick a palette. Use a color wheel if you need a reference for picking complementary colors. (15 – 20 minutes)

You should have a primary/main, a neutral, a background (lightest), and an accent (brightest). Sometimes you can have a couple accents, but they shouldn’t overwhelm. Light text on a dark background OR dark text on a light background – there’s no acceptable in-between. Be cautious with gradients.

For my iConference poster, I chose colors from the eBird interface so that the screenshots would blend well into the composition. For a poster that I presented in Sanibel, FL, I chose colors that remind me of Florida – peach, teal, lime, and orange – which might sound riotous but actually worked beautifully.

4. Pick a grid. (5 – 10 minutes)

Sketching helps and is fun, too. I use an oversized artist’s sketch book. You can use three equally-sized horizontal or vertical bars, a 2×2 or 3×3 grid, a title strip with a 3×2 grid beneath it, etc. When you put together the layout, you will align content within the grid and on the intersection points of the grid (see the rule of thirds).

5. Lay out poster elements on the grid using grayscale blocks, i.e., with pencil! (1 hour)

No images, no text, no detail. Just blocks. The darker the gray, the more important the content. Pay attention to balance, alignment, etc. Think about how your layout leads the eye around the space, and how the “order” of the content aligns with that visual flow. Don’t forget that you can use geometric shapes, arrows, drop shadows (all in the same direction) and bounding boxes to call attention to content. I think this part is a lot of fun.

Sketch for a Poster Layout

6. Match the labels of your content to the elements in the sketch, and then translate to digital. (2 – 3 hours)

Sometimes as you lay out the elements on the screen, you find that some part of the concept doesn’t work as well as you imagined. For example, my original design for the eBird poster had headers for the center section alternating at top on the left for “Birding” and at bottom on the right for “eBirding.” That doesn’t work well for visual flow, so in the final poster, the headers were on top for both sides of the center panel.

7. Get some colleagues to critique your layout. Listen to them. No matter how much you love your diagonal text, gradients, and dense paragraphs full of citations, if they say it needs to go, take them seriously! (30 minute discussion)

8. Revise the layout. Just until it’s good enough – don’t fiddle around with it all day. (1 – 3 hours)

Besides these steps, there are a couple of other things that I do consistently, no matter what my other layout choices are. You basically can’t go wrong when you:

  • Put the title at the top in the biggest font size that you can fit – you want people to see this loud and clear.
  • Put the author names with affiliations and email addresses directly below the title, in header-sized font – no point in hiding from your own work!
  • Put acknowledgements in fairly small text along the bottom  – this won’t hurt your funder’s feelings.

Here are a few additional rules of thumb that I apply to selecting poster content:

  • Always include a research question, background and/or motivation section (any lit review goes here), methods summary, and next steps or future work. Usually my focus for the poster is primarily on the stuff I find most interesting: analysis, discussion, case descriptions, etc. Not citations.
  • Once you’ve chosen the text you want to use, cut it by at least 50% because you have too much. Seriously. Don’t be the person who pastes the entire text of the extended abstract on a big piece of paper. That’s not designing a poster. That’s pasting your abstract on a big piece of paper. It’s bad.
  • Don’t go too far in the other direction, with no substantive textual content at all. It may look pretty, but if you have to refresh your beverage or duck out to the restroom during a poster session, the poster should stand on its own.
  • Don’t copy verbatim from your paper or abstract, with the exception of research questions or hypotheses – which might still need paraphrasing if they’re long and convoluted – and quotes. People love quotes!
  • Include no more than 3 sentences in any given block of text, and keep them short.
  • Take advantage of bulleted/numbered lists. Use phrases rather than complete sentences for list items.
  • You should be able to read almost everything easily from 10 feet away. If you print your poster out on a sheet of notebook paper, all the text should be readable without a magnifying glass.
  • Remember that the poster is a conversation starter, not the full detail you put in your abstract, but it needs to have all the basic pieces so that it can be relatively self-explanatory.

All of these rules and heuristics can be bent or broken, but do so carefully!

For those without a background in design, a nice reference for visual design is The Non-Designer’s Design Book because it covers all the basics in a really digestible format. It’s well worth the minor investment if you design posters more often than once in a lifetime. I also like colorindex for color palette ideas, and some of the other books in that series by Jim Krause are really nice for design ideas.

Update: Gosh, wouldn’t it be helpful if I included the finished poster so you could see what I got out of this process?

The finished poster – a full-res version can be downloaded here.