Qualitative Analysis Tools

In part three of my review of software that I use for my academic work, I’m covering that all-time favorite, qualitative analysis tools! I have never seen a topic that gets so many requests for help (mostly from doctoral students) with so few useful answers. So here are a handful of tools that I have found helpful for my dissertation work, which involves qualitative analysis of semi-structured interviews, field notes, and documents.

As always, my main caveat is that these are Mac OS X programs. In fact, almost exclusively. If you’re spending a lot of time with a piece of software, having it behave like an OS native application is not worth the compromise. And as usual, I tend to favor of open source, free, or low-cost options. For the work that I’ve done, the applicable categories include data capture, transcription, coding, and theorizing (which might also apply for some quantitative work, depending on the nature of the beast.)

Data Capture

Sometimes you need screen shots. For this, I just use the Mac OS X built-in tool, Grab (may be under “Utilities” in your Applications folder), which works with keyboard shortcuts – my favorite! However, it grabs tiffs, which aren’t the most friendly format, and no matter what tool you use, screen captures are almost always 72 dpi = not print quality. So I resize to 300 dpi with Photoshop, making sure not to exceed the original file size (interpolated bits look just as bad as low dpi).

Sometimes you need to record a whole session of computer-based interaction. For that, nothing rivals Silverback for functionality and cost. It’s pretty cheap, works like a dream, and is great for capturing your own experiences, or that of participants. It uses your Mac’s built-in camera and mics to pick up the image and sound of the person at the keyboard, while logging and displaying keyboarding and mouseclicks. And it doesn’t make you record your settings until the end, so that’s one less thing to screw up when you’re setting up your session. Brilliant! I have to thank the WattBot CHI 2009 student design competition finalists from Indiana State for this discovery, since I never would have though to look for something like this. I use Silverback to log my own use of online tools for participant observation. It’s really entertaining to watch myself a year ago as I was just starting to use eBird. OK, more like painful. But compared to now, it’s really valuable to have those records of what the experience used to be like.

Transcription

I record all my interviews with a little Olympus digital recorder. It’s probably no longer on the market, but it was about $80 in 2007 and well worth every penny, even though at that time I mistakenly thought that I’d never do qualitative research. It was the second-best product from Olympus at the time, and has a built-in USB to move the files to a computer. Great. Except that all the files are in WMA format. All2MP3 to the rescue – free software is hard to beat. For awhile, I used a different audio converter, but it stopped working with an OS update and then I found this one. It’s dead simple, and despite the warnings that it always gives me about suboptimal formats, it works like a charm, every time.

But once those interviews are translated into a playable format, I still have to transcribe them. It’s good data review, of course, besides being cheaper than hiring someone – depending on your calculations. MacSpeech Dictate (now called Dragon Dictate) is my tool of choice for this task; it’s the Mac equivalent of Dragon Naturally Speaking, for you Windows users out there. Both softwares are owned by the same company, and you basically shouldn’t waste your time with anything else, because they are the market leader for a reason.

I use the voice recognition software to listen to my audio recordings with earbuds, and use the included headset to dictate the interview. The separate audio and voice systems are truly necessary, because if I can hear myself talking, it distracts me from what I’m dictating. It’s not flawless, but once the software was trained and so was I, it has worked pretty well. The big drawback is that it costs about $200. The big plus is that I went from 4-5 hours of transcription time for each hour of recording to 2-3 hours, and that’s a nontrivial improvement! I have definitely saved enough hours to make it a good deal for the grant that paid for it.

If you’re using dictation software, you have to dictate into some other software. And something has to play your audio files, too. Surprisingly enough (or not?), I have found open source software from IBM that works pretty well: it’s called IBM Video Note Taking Utility. Although it was originally PC-native, I begged the developer to encode Mac keyboard shortcuts as well, which he did – awesome!

The software was created for video transcription, but I just use it for audio. It’s very simple: you load up an mp3, it makes a text file, and you can use keyboard shortcuts to skip forward, backward, pause, and speed up or slow down the recording (plus some other stuff I don’t use). There are a couple of quirks, but the price is right and it does exactly what I want without lots of extra confusing stuff going on. Most of my transcription happens at 0.6 times normal speed, so when you take into account some correction time, the fact that I’m transcribing an hour of transcript in 2-3 hours means it’s nearly real-time transcription and there’s very little additional overhead. It’s just not possible to do any transcription at normal speaking speed, because unless you’re a court reporter, you just can’t keep up with what people are saying!

Coding/Annotation

When I first started working on qualitative research, one of my initial tasks was finding coding software that I liked. If you’re not using software for this task, consider joining the digital age. There are better options out there than innumerable 3×5 cards or sticky notes, even if you have to pay for it and spend a little time learning how to use it; the time you save is worth much more than the software costs. After some fairly comprehensive web searching, I was kind of horrified at how bad the options were for Mac-native software. $200 for what? Not much, I’ll tell you that. And from what I’ve seen looking over others’ shoulders, I don’t think the PC stuff is a ton better.

But there was something better than the modernized HyperCard option that I found, and pretty much everything else. And it, too, is open source! TAMS Analyzer has got my back when it comes to qualitative data analysis. It’s super-flexible, has a lot of power for searching, matching, and even visualizing your code sets, and can produce all the same intercoder reliability stats as the pricey licensed software. There’s a bit of learning curve, but I expect that’s true of any fully-featured annotation software. Plus, there’s a server version that has check-in/check-out control, which is awesome if you have multiple coders working on the same texts, and it’s pretty easy to set up (all things considered, you do have to be able to set up a mySQL database.) I have barely scraped the surface in terms of using its full capabilities. I’m constantly finding yet another awesome thing it can do, and I learn the functionality as I need it – all the really powerful stuff it can do doesn’t interfere with using it out of the box, so to speak.

And after you’ve spent some quality time with your coding, the time will come to sort those codes. For this, I use OmniOutliner, another product from the awesome OmniGroup. Once you have a huge heap of codes, the drag-and-drop hierarchical outline functionality is a highly convenient, fairly scalable way to handle getting your codes in order. I’ve done this with note cards, and it’s a big mess, excessively time-consuming by comparison to using digital tools, and wastes a lot of paper that is then hard to store. I also like keeping an “audit trail” of my work, so having the digitally sorted codes (in versioned documents) is a great way to do it.

Theorizing

Ah, theory. That’s what we’re all doing this academic thing for, right? Well, that or fame and glory, and we all know which one of those is more likely.

Everyone has their own way of thinking about this. I draw diagrams. And when I draw diagrams, whether for a poster, paper, or to sort out my own thinking, I use OmniGraffle. I can’t begin to say how awesome this software is, and how much mileage I’ve gotten out of my license cost. Enough that I should pay for it again, that’s how good it is. My husband calls OmniGraffle my “theory software” because when I’m using it, he knows I’m probably working on theory. I find it really useful for diagramming relationships between concepts and thinking visually about abstractions. Depending on the way you approach theorizing, it might be worth a try (free trials!)

So that’s the end of my three-part series on software to support academic work. I hope someone out there finds it useful, and if you do, please give one of these posts a shout-out on your social network of choice. You’ll be doing your academic pals a favor, because we all know that’s how people find information these days. :)

Tools of the Trade: Quantitative Analysis

Following up on my last post about the tools that I prefer for organizing and writing in academic work, today I’m going to review my preferred software for quantitative analysis. Yep, there’s enough that falls under “analysis” to merit two posts. This will be the easier of the two posts to write on analysis tools, because I find that qualitative analysis takes a much more complex assembly of technical tools to support the work.

All of these tools are cross-platform (except the SNA software) so although the view on my Mac OS X screen may look a little different than it would on other platforms, the essential functionality is all the same. Isn’t that nice? So let’s begin with the tool that makes the research world go ’round: Excel.

Yes, Excel is a Microsoft product, which I usually avoid. But it’s so functional that it’s hard to use anything else, and I have extensive experience doing some very fancy tricks with Excel. You know, the “power user” kind of stuff, like PivotTables in linked workbooks with embedded ODBC lookups (yep, fancy!) The simple fact of the matter is that a lot of science is done with Excel, so almost no one doing quantatitive research can completely avoid it. However, the advice that I offer when working with a spreadsheet tool for research is:

  1. Keep a running list of the manipulations you’ve done on your data. Embed explanations on your worksheets. It’s way too easy for a worksheet to become decontextualized and then you have no idea how you got those results or why you have two sets of results and which one is the right one. This is a pain to do, but trust me, keeping a record like this will save your hide at some point.
  2. Take the time to learn how to use named ranges and linked worksheets. This dramatically improves your ability to do data manipulation in a separate worksheet without touching the original copy, meaning you always have the initial version to return to. This is more important than I can possibly emphasize. Don’t mess with your raw data in Excel unless you have another (preferably uneditable) copy elsewhere!
  3. Customize your toolbars for maximum utility if you’re a frequent user. For example, I have added a button on the toolbar for “paste values” because this is a really useful function that doesn’t have an adequate keyboard shortcut, even though I’ve tried to program one. And for that matter, programming custom keyboard shortcuts for commonly used commands is also a really good idea if you use Excel often.
  4. Install the Analysis Toolpak for grown-up statistics. Use the Formula Viewer to understand what the heck is supposed to go into the formulae. I’ve found this helpful for data interpretation on more than one occasion.
  5. VLOOKUP. Learn it. Love it.

R is my go-to tool for statistical analysis, including network analysis. If you don’t know R, it’s basically a robust, free answer to (very expensive and limited time licenses for) SAS or SPSS. It can do just about anything you want, and it has a core-and-package structure that lets you download and activate packages at will to do specialized kinds of analysis. R is well supported in the research community and you’re sure to find a package that does what you need. Like the other major statistical analysis tools, it has its own sort of syntax, but I suspect it’s no harder to learn than the other stuff. R is a great tool, and it hooks into other analysis tools very nicely.

Tools like Taverna, which is a scientific workflow tool. I’ve used this for replicable, self-documenting, complex data retrieval, manipulation, and analysis routines. I’ve written papers about it and spent time with the myGrid team in the UK helping them evaluate usability. I’m definitely a fan of Taverna and I found it really useful for the kind of complex secondary data analysis that I worked on for free/libre open source software research. I’ll even be teaching a course this fall on eScience workflow tools, including Taverna.

Protege is an ontology editor. Ontologies aren’t exactly quantitative analysis, but they can be really useful in doing quantitative analysis of large data sets with semantic properties. If for any reason you need to build an ontology, Protege is a really nice tool.

Finally, the ultimate irony – buying proprietary software to run open source software. I use VMWare Fusion to run Windows XP so I can use Pajek for social network analysis. VMWare Fusion is extremely satisfactory software for the purpose and doesn’t cost much; I have been very happy with it. Windows XP is, well, Windows.

Pajek is nothing but ugly, interface-wise, but don’t let that put you off because it does the job well and has a lot of really detailed options for SNA. It has the most insanely deep menus I’ve ever seen, but to be fair, there’s a lot of analytical complexity under the hood. It also does visualizations, but they aren’t the prettiest thing you’ve ever seen. There are a lot of tools that you can choose for SNA, and this software choice reflects the fact that what I usually need is statistics, not pretty pictures. There’s even a great book for learning how to use Pajek – it was worth every penny when I was learning SNA, because it not only shows you how to use the software, but explains the SNA concepts pretty effectively as well.

Getting It Done: Tools for Organizing and Writing

Some people believe that I never sleep, but that’s really not true. I do sleep, at least sometimes, and I’m also fairly productive.

Achieving a relatively high level of productivity depends in part upon having good tools to support your work, and tools that work well for your working style. So this is the first of two posts on the subject of software that supports academic work. “What software should I use for X?” is a perennial question posed by PhD students everywhere, and software is now pretty essential to academic productivity. This post focuses on tools for organizing, writing, and presenting (I covered poster design previously); the follow-up post will describe my favorite research tools.

The big disclaimer: I use a Mac. If you don’t use a Mac, your mileage may vary, but some of these programs do have versions for Windows and other operating systems. I generally avoid Microsoft software in favor of Apple software (much cheaper and generally good design) and open source software (generally awesome, and free!)

Organizing

Everyone has to stay organized somehow. Some of us make a lot of lists. I definitely make an excessive number of lists. To the point where I’ve made lists of lists. It eventually becomes unsupportable; at some points in time, I spent more effort on keeping lists updated than doing the stuff on the lists. But there’s a reason to make lists – it gets all that stuff out of your head, leaving your brain free to think about more important stuff!

My main tool for keeping all my to-do items in order is OmniFocus, which is wonderful Mac-specific software from OmniGroup. I’m a big fan of OmniGroup software; they make very well designed and thoughtful tools. There are versions of OmniFocus for desktop, iPhone, and iPad – and I use them all. This is one of those tools that can be really useful for supporting GTD, if that’s your thing. If you have your own way of doing things, you can still adapt your use of OmniFocus to do things your own way. So now I get to have my to-do list readily synced across my digital devices at all times. And the OmniGroup ninjas (that’s actually what they call their tech support) are responsive and have a sense of humor. How much better can it get?

Another aspect of keeping your stuff together is keeping files synced, if you use multiple machines. Keeping files synced becomes a problem the second you start using more than one machine. At this point, I work on my (dying) 15″ MacBook Pro, a beautiful zippy still-new 27″ iMac, plus my iPad. And sometimes my iPhone 4, when I have no other options. I use MobileMe to sync the Apple-specific stuff, like Contacts and Mail, but I use Dropbox (platform agnostic) for everything else.

Recent security hullaballoos aside, it’s a very usable solution, and that’s why so many people have adopted it. Although I already pay for MobileMe, it doesn’t behave the way I want, with the exception of the Apple-specific syncing, so now I pay for Dropbox storage as well. Without any additional effort or change to thoroughly-ingrained file management behaviors on my part, I can manage all my files locally in the same fashion that I always have, with the only change being that I use my Dropbox folder as my primary storage space instead of my Documents folder. And everything is then magically synced across machines.

Dropbox also gives me double-plus file backup: my files are now backed up three ways from Sunday, because they’re synced on every machine I use, they’re backed up on my Time Capsule (a simply brilliant piece of personal computing infrastructure), and they’re backed up to Dropbox in the cloud. That adds up to serious peace of mind when it comes to irreplaceable research data. Even better, there’s version control, so if I really screw something up and don’t have a Time Machine backup for whatever reason, there’s a Dropbox backup. On top of everything else, there’s nice file sharing with Dropbox, so that’s also been very handy for research collaboration, particularly when concurrent editing is not a concern.

Writing

Once you’ve got your stuff in order, you have to write about it; this is how we produce new knowledge (the part where I talk about producing the stuff to write about will be in the next post…) Everyone has their preferences for organizing ideas to write, and for word processing. Some of us even eschew word processing altogether, and go for the gold with typesetting.

OmniOutliner is my favorite tool for organizing ideas. It’s also from the OmniGroup (obviously, I think?) and is a really simple but highly functional program for making, what else, outlines! I find it more useful than most other tools when it’s time to start organizing ideas for writing. The interface is simple enough as to be non-distracting, and I like the ease of the drag-and-drop interface. It doesn’t export to other formats as easily as I want, but cut-and-paste will always save the day.

When doing collaborative writing where concurrent editing may occur (e.g., last minute papers with crazy late jam sessions) then Google Docs is a winner. It’s browser-based, so it doesn’t matter what kind of operating system you’re using. The interface has really improved, since there’s now an embedded chat functionality, commenting, and you can see the other people’s cursor positions. Google seems to have taken the best features from EtherPad and integrated them with the existing Google Docs functionality for a hands-down winner. Sadly, one of the only things it doesn’t do to my satisfaction is support LaTeX, but that’s only an inconvenience and easy enough to work around.

Some of you don’t know what LaTeX is. That’s OK, you probably don’t need to know. But I’m going to tell you anyway. It’s a free/open source software document preparation system with structural markup, much like HTML, but for making beautifully typeset documents, and it too is platform agnostic. Note that document preparation is not the same as “word processing.” LaTeX is what I prefer to use to write my papers, largely because Word has a tendency to crash on me, is inordinately slow, and is badly behaved in innumerable other ways. And I hate that stupid ribbon. I only use Word when my collaborators are unable to use anything else. I won’t lie – there’s a definite learning curve with LaTeX. It takes a little work, but I’ve found it completely worthwhile.

LaTeX is also nice because having structural markup means you can use style sheets, so you can change the appearance of multiple documents, and link documents, with relative ease. You can use any text editor to write a .tex file, so you can have a completely minimalist interface or something with lots of distracting buttons all over, whatever you prefer. Another benefit is the easy availability of many packages to do just the thing you want, and it is the only system that I have yet encountered that does any justice to mathematical equations. Math rendered in LaTeX looks like math ought to look. One of the few things I think it does really poorly, however, is tables. You can make great looking, tightly controlled tables in LaTeX, but it requires some patience. Even if you don’t want to get all control-freaky over your tables, you’re probably going to have to do that anyway.

Working with LaTeX becomes a little easier with the use of macros and a nice editing environment. You can edit your .tex files in emacs or vim (as I’ve done in the past) but I really like TeXShop for the easy, non-intrusive GUI. It comes with the MacTeX distro, so if you just download that nice big package, you’ll have all the pieces in one place. Another essential tool, for when an editor tells you to submit a final copy in Word after you’ve prepared the original submission in LaTeX, is latex2rtf. This tool lets you use your command-line (e.g., Terminal) interface to produce a Word-readable .rtf file out of your nice pretty LaTeX file. It won’t look as good as it once did, but all the stuff will be there in the right places, more or less. It’s the fastest way that I’ve yet found to convert a LaTeX file into Word, even if it does require a little post-hoc cleanup.

Reference Management

There are really only a few robust reference management software options out there. I’m sure Zotero has improved substantially since I last used it, but it was just plain inadequate when I last tried it, and I don’t have the time or energy to wait around for software to evolve. I am not about to spend a lot of time with Mendeley either, because it actually does way too much for what I want out of reference management software, and I prefer my tools as simple and reliable as possible.

I started off my academic career using EndNote, before I became a LaTeX convert. EndNote is nice enough, and has a bunch of good features, but I haven’t spent the additional $100 per update since EndNote X, largely because BibDesk is free (open source), works with LaTeX, and pretty much all of the reference managers are able to translate among one anothers’ formats. With greater or lesser ease, of course. A nice detail when using Google Scholar in a logged-in state is that you can set your preferences to provide a link for references in .bib format, suited for pasting into the bibtex files that go along with LaTeX documents.

Presenting

Everyone has to make a slide deck at some point, even if it makes Edward Tufte kill a kitten. I’m not a big fan of slides, but maybe I’m just being old-fashioned because I grew up on chalkboard dust.

Regardless, there are only a couple of options for presentation software. Most people use Powerpoint. I don’t really like it. It’s not as horrible as some other Microsoft Office products (I’m looking at you, Entourage) but it’s not great. I suspect the Google Docs version of ppt is much nicer, if only because it’s probably a bit more stripped-down – but I don’t use that either. And no, Open Office is just not adequate. I’m sorry, I really want to go with the open source option, but its interface (last time I looked at it) was stuck in the mid-1990’s and hurt my eyes.

Instead, I use Keynote, another Apple product. It’s slick, adequately functional, and pretty smart about a lot of little details. I get tired of the canned themes, but it’s easy enough to make your own. One of my favorite details about Keynote is that there’s an iPad (and iPhone!) version, so I can actually edit slides on my iPad, and present from it. That’s just lovely. Of course, the iOS version of Keynote is limited (this should be obvious, iOS is not OS X, just like an iPad is not a MacBook) but it’s quite functional for editing and presenting. I’ve never built an iOS Keynote presentation from scratch, but it can be done, and that’s nice flexibility to have.

Finally, when making your presentation, the last thing you want to have happen is your screen saver kicking in, or the power saving settings overtaking your display while you debate some point. I’ve seen this happen way too often, and it just doesn’t look very professional. Rather than edit your system preferences every time you get ready to make a presentation, I now rely on Caffeine. It’s a free Mac app (via the App Store, or just download and install as software the normal way) that, when activated, prevents your power saving settings from invoking and doesn’t permit your screen saver to take over your display. You turn it on and off from the menu bar by clicking on the little coffee cup icon. Simply brilliant!