programming

Graduate Education

Are you interested in the study of evolutionary biology? Are you excited about the promise that new genomic tools offer for identifying the factors that promote and maintain biological diversity around the planet? Do you envision solving great problems as a scientist?

Good. Now put down the pipettor, and start learning to program. More and more, the work being done in labs like mine (and many of the evolution labs here in the Department of Genetics) involves data sets that are orders of magnitude larger than anything available when I got my Ph.D. Not only could you collect all of the data from my dissertation in one day of a well-organized sequencing effort, you would have to fill up the array with dozens of other dissertations as well to make it cost-efficient.... at a total cost of only a few thousand dollars.

All of that data means that it isn’t enough to have an interesting idea. You have to be able to manipulate large amounts of data, whether you are doing genomics or ecology or developmental biology, and this means you (prospective graduate student) need to become very familiar with languages or programming environments like R, python, perl. Twenty years ago you might be asked to be proficient in German or Spanish to get a Ph.D. Now it would be one of these languages, certainly. We spend more time in front of computers, and less time in the lab (fortunately, for a lab like mine, the amount of field work is about the same, though never as much as I’d like!).

Many fields are being transformed by computational approaches, and that is not a bad thing. It simply means more data. This is true in linguistics and literature as well as the sciences. So as you start to write your essays about what attracts you to our graduate program (I’m graduate coordinator, I expect to read about 40 of these essays in late December....), think hard about the ways you can learn to work with data quickly and efficiently. Make the data work for you, not the other way around. I say this from the rough experience of juggling a number of large data sets, each with their own quirks - now if only I could write a script that could write the papers for me as well....