Tools of the Trade: Quantitative Analysis

Following up on my last post about the tools that I prefer for organizing and writing in academic work, today I’m going to review my preferred software for quantitative analysis. Yep, there’s enough that falls under “analysis” to merit two posts. This will be the easier of the two posts to write on analysis tools, because I find that qualitative analysis takes a much more complex assembly of technical tools to support the work.

All of these tools are cross-platform (except the SNA software) so although the view on my Mac OS X screen may look a little different than it would on other platforms, the essential functionality is all the same. Isn’t that nice? So let’s begin with the tool that makes the research world go ’round: Excel.

Yes, Excel is a Microsoft product, which I usually avoid. But it’s so functional that it’s hard to use anything else, and I have extensive experience doing some very fancy tricks with Excel. You know, the “power user” kind of stuff, like PivotTables in linked workbooks with embedded ODBC lookups (yep, fancy!) The simple fact of the matter is that a lot of science is done with Excel, so almost no one doing quantatitive research can completely avoid it. However, the advice that I offer when working with a spreadsheet tool for research is:

  1. Keep a running list of the manipulations you’ve done on your data. Embed explanations on your worksheets. It’s way too easy for a worksheet to become decontextualized and then you have no idea how you got those results or why you have two sets of results and which one is the right one. This is a pain to do, but trust me, keeping a record like this will save your hide at some point.
  2. Take the time to learn how to use named ranges and linked worksheets. This dramatically improves your ability to do data manipulation in a separate worksheet without touching the original copy, meaning you always have the initial version to return to. This is more important than I can possibly emphasize. Don’t mess with your raw data in Excel unless you have another (preferably uneditable) copy elsewhere!
  3. Customize your toolbars for maximum utility if you’re a frequent user. For example, I have added a button on the toolbar for “paste values” because this is a really useful function that doesn’t have an adequate keyboard shortcut, even though I’ve tried to program one. And for that matter, programming custom keyboard shortcuts for commonly used commands is also a really good idea if you use Excel often.
  4. Install the Analysis Toolpak for grown-up statistics. Use the Formula Viewer to understand what the heck is supposed to go into the formulae. I’ve found this helpful for data interpretation on more than one occasion.
  5. VLOOKUP. Learn it. Love it.

R is my go-to tool for statistical analysis, including network analysis. If you don’t know R, it’s basically a robust, free answer to (very expensive and limited time licenses for) SAS or SPSS. It can do just about anything you want, and it has a core-and-package structure that lets you download and activate packages at will to do specialized kinds of analysis. R is well supported in the research community and you’re sure to find a package that does what you need. Like the other major statistical analysis tools, it has its own sort of syntax, but I suspect it’s no harder to learn than the other stuff. R is a great tool, and it hooks into other analysis tools very nicely.

Tools like Taverna, which is a scientific workflow tool. I’ve used this for replicable, self-documenting, complex data retrieval, manipulation, and analysis routines. I’ve written papers about it and spent time with the myGrid team in the UK helping them evaluate usability. I’m definitely a fan of Taverna and I found it really useful for the kind of complex secondary data analysis that I worked on for free/libre open source software research. I’ll even be teaching a course this fall on eScience workflow tools, including Taverna.

Protege is an ontology editor. Ontologies aren’t exactly quantitative analysis, but they can be really useful in doing quantitative analysis of large data sets with semantic properties. If for any reason you need to build an ontology, Protege is a really nice tool.

Finally, the ultimate irony – buying proprietary software to run open source software. I use VMWare Fusion to run Windows XP so I can use Pajek for social network analysis. VMWare Fusion is extremely satisfactory software for the purpose and doesn’t cost much; I have been very happy with it. Windows XP is, well, Windows.

Pajek is nothing but ugly, interface-wise, but don’t let that put you off because it does the job well and has a lot of really detailed options for SNA. It has the most insanely deep menus I’ve ever seen, but to be fair, there’s a lot of analytical complexity under the hood. It also does visualizations, but they aren’t the prettiest thing you’ve ever seen. There are a lot of tools that you can choose for SNA, and this software choice reflects the fact that what I usually need is statistics, not pretty pictures. There’s even a great book for learning how to use Pajek – it was worth every penny when I was learning SNA, because it not only shows you how to use the software, but explains the SNA concepts pretty effectively as well.

Getting It Done: Tools for Organizing and Writing

Some people believe that I never sleep, but that’s really not true. I do sleep, at least sometimes, and I’m also fairly productive.

Achieving a relatively high level of productivity depends in part upon having good tools to support your work, and tools that work well for your working style. So this is the first of two posts on the subject of software that supports academic work. “What software should I use for X?” is a perennial question posed by PhD students everywhere, and software is now pretty essential to academic productivity. This post focuses on tools for organizing, writing, and presenting (I covered poster design previously); the follow-up post will describe my favorite research tools.

The big disclaimer: I use a Mac. If you don’t use a Mac, your mileage may vary, but some of these programs do have versions for Windows and other operating systems. I generally avoid Microsoft software in favor of Apple software (much cheaper and generally good design) and open source software (generally awesome, and free!)


Everyone has to stay organized somehow. Some of us make a lot of lists. I definitely make an excessive number of lists. To the point where I’ve made lists of lists. It eventually becomes unsupportable; at some points in time, I spent more effort on keeping lists updated than doing the stuff on the lists. But there’s a reason to make lists – it gets all that stuff out of your head, leaving your brain free to think about more important stuff!

My main tool for keeping all my to-do items in order is OmniFocus, which is wonderful Mac-specific software from OmniGroup. I’m a big fan of OmniGroup software; they make very well designed and thoughtful tools. There are versions of OmniFocus for desktop, iPhone, and iPad – and I use them all. This is one of those tools that can be really useful for supporting GTD, if that’s your thing. If you have your own way of doing things, you can still adapt your use of OmniFocus to do things your own way. So now I get to have my to-do list readily synced across my digital devices at all times. And the OmniGroup ninjas (that’s actually what they call their tech support) are responsive and have a sense of humor. How much better can it get?

Another aspect of keeping your stuff together is keeping files synced, if you use multiple machines. Keeping files synced becomes a problem the second you start using more than one machine. At this point, I work on my (dying) 15″ MacBook Pro, a beautiful zippy still-new 27″ iMac, plus my iPad. And sometimes my iPhone 4, when I have no other options. I use MobileMe to sync the Apple-specific stuff, like Contacts and Mail, but I use Dropbox (platform agnostic) for everything else.

Recent security hullaballoos aside, it’s a very usable solution, and that’s why so many people have adopted it. Although I already pay for MobileMe, it doesn’t behave the way I want, with the exception of the Apple-specific syncing, so now I pay for Dropbox storage as well. Without any additional effort or change to thoroughly-ingrained file management behaviors on my part, I can manage all my files locally in the same fashion that I always have, with the only change being that I use my Dropbox folder as my primary storage space instead of my Documents folder. And everything is then magically synced across machines.

Dropbox also gives me double-plus file backup: my files are now backed up three ways from Sunday, because they’re synced on every machine I use, they’re backed up on my Time Capsule (a simply brilliant piece of personal computing infrastructure), and they’re backed up to Dropbox in the cloud. That adds up to serious peace of mind when it comes to irreplaceable research data. Even better, there’s version control, so if I really screw something up and don’t have a Time Machine backup for whatever reason, there’s a Dropbox backup. On top of everything else, there’s nice file sharing with Dropbox, so that’s also been very handy for research collaboration, particularly when concurrent editing is not a concern.


Once you’ve got your stuff in order, you have to write about it; this is how we produce new knowledge (the part where I talk about producing the stuff to write about will be in the next post…) Everyone has their preferences for organizing ideas to write, and for word processing. Some of us even eschew word processing altogether, and go for the gold with typesetting.

OmniOutliner is my favorite tool for organizing ideas. It’s also from the OmniGroup (obviously, I think?) and is a really simple but highly functional program for making, what else, outlines! I find it more useful than most other tools when it’s time to start organizing ideas for writing. The interface is simple enough as to be non-distracting, and I like the ease of the drag-and-drop interface. It doesn’t export to other formats as easily as I want, but cut-and-paste will always save the day.

When doing collaborative writing where concurrent editing may occur (e.g., last minute papers with crazy late jam sessions) then Google Docs is a winner. It’s browser-based, so it doesn’t matter what kind of operating system you’re using. The interface has really improved, since there’s now an embedded chat functionality, commenting, and you can see the other people’s cursor positions. Google seems to have taken the best features from EtherPad and integrated them with the existing Google Docs functionality for a hands-down winner. Sadly, one of the only things it doesn’t do to my satisfaction is support LaTeX, but that’s only an inconvenience and easy enough to work around.

Some of you don’t know what LaTeX is. That’s OK, you probably don’t need to know. But I’m going to tell you anyway. It’s a free/open source software document preparation system with structural markup, much like HTML, but for making beautifully typeset documents, and it too is platform agnostic. Note that document preparation is not the same as “word processing.” LaTeX is what I prefer to use to write my papers, largely because Word has a tendency to crash on me, is inordinately slow, and is badly behaved in innumerable other ways. And I hate that stupid ribbon. I only use Word when my collaborators are unable to use anything else. I won’t lie – there’s a definite learning curve with LaTeX. It takes a little work, but I’ve found it completely worthwhile.

LaTeX is also nice because having structural markup means you can use style sheets, so you can change the appearance of multiple documents, and link documents, with relative ease. You can use any text editor to write a .tex file, so you can have a completely minimalist interface or something with lots of distracting buttons all over, whatever you prefer. Another benefit is the easy availability of many packages to do just the thing you want, and it is the only system that I have yet encountered that does any justice to mathematical equations. Math rendered in LaTeX looks like math ought to look. One of the few things I think it does really poorly, however, is tables. You can make great looking, tightly controlled tables in LaTeX, but it requires some patience. Even if you don’t want to get all control-freaky over your tables, you’re probably going to have to do that anyway.

Working with LaTeX becomes a little easier with the use of macros and a nice editing environment. You can edit your .tex files in emacs or vim (as I’ve done in the past) but I really like TeXShop for the easy, non-intrusive GUI. It comes with the MacTeX distro, so if you just download that nice big package, you’ll have all the pieces in one place. Another essential tool, for when an editor tells you to submit a final copy in Word after you’ve prepared the original submission in LaTeX, is latex2rtf. This tool lets you use your command-line (e.g., Terminal) interface to produce a Word-readable .rtf file out of your nice pretty LaTeX file. It won’t look as good as it once did, but all the stuff will be there in the right places, more or less. It’s the fastest way that I’ve yet found to convert a LaTeX file into Word, even if it does require a little post-hoc cleanup.

Reference Management

There are really only a few robust reference management software options out there. I’m sure Zotero has improved substantially since I last used it, but it was just plain inadequate when I last tried it, and I don’t have the time or energy to wait around for software to evolve. I am not about to spend a lot of time with Mendeley either, because it actually does way too much for what I want out of reference management software, and I prefer my tools as simple and reliable as possible.

I started off my academic career using EndNote, before I became a LaTeX convert. EndNote is nice enough, and has a bunch of good features, but I haven’t spent the additional $100 per update since EndNote X, largely because BibDesk is free (open source), works with LaTeX, and pretty much all of the reference managers are able to translate among one anothers’ formats. With greater or lesser ease, of course. A nice detail when using Google Scholar in a logged-in state is that you can set your preferences to provide a link for references in .bib format, suited for pasting into the bibtex files that go along with LaTeX documents.


Everyone has to make a slide deck at some point, even if it makes Edward Tufte kill a kitten. I’m not a big fan of slides, but maybe I’m just being old-fashioned because I grew up on chalkboard dust.

Regardless, there are only a couple of options for presentation software. Most people use Powerpoint. I don’t really like it. It’s not as horrible as some other Microsoft Office products (I’m looking at you, Entourage) but it’s not great. I suspect the Google Docs version of ppt is much nicer, if only because it’s probably a bit more stripped-down – but I don’t use that either. And no, Open Office is just not adequate. I’m sorry, I really want to go with the open source option, but its interface (last time I looked at it) was stuck in the mid-1990’s and hurt my eyes.

Instead, I use Keynote, another Apple product. It’s slick, adequately functional, and pretty smart about a lot of little details. I get tired of the canned themes, but it’s easy enough to make your own. One of my favorite details about Keynote is that there’s an iPad (and iPhone!) version, so I can actually edit slides on my iPad, and present from it. That’s just lovely. Of course, the iOS version of Keynote is limited (this should be obvious, iOS is not OS X, just like an iPad is not a MacBook) but it’s quite functional for editing and presenting. I’ve never built an iOS Keynote presentation from scratch, but it can be done, and that’s nice flexibility to have.

Finally, when making your presentation, the last thing you want to have happen is your screen saver kicking in, or the power saving settings overtaking your display while you debate some point. I’ve seen this happen way too often, and it just doesn’t look very professional. Rather than edit your system preferences every time you get ready to make a presentation, I now rely on Caffeine. It’s a free Mac app (via the App Store, or just download and install as software the normal way) that, when activated, prevents your power saving settings from invoking and doesn’t permit your screen saver to take over your display. You turn it on and off from the menu bar by clicking on the little coffee cup icon. Simply brilliant!