Your toolbox

A common question from computational newcomers is, “Where should I start?”

Galaxy

If you have a question that is commonly encountered by other people doing biological research, your first step should be to take a look at Galaxy. Galaxy is many things, but from most users’ perspective it’s a web application for analyzing large amounts of biomedical data. There are public Galaxy instances, like Galaxy main, and private ones, like our own galaxy.fredhutch.org, which runs on the same infrastructure as the rest of the Hutch’s scientific computing services. Regardless of where the instance is, Galaxy makes available a large number of standard analysis programs (cufflinks, MACS, EMBOSS, etc.) wrapped in a much friendlier interface than the programs themselves are typically known for. In addition to the tools included with every Galaxy installation, there are repositories called tool sheds where tools developed by the community are shared. The main Galaxy Tool Shed is easy to search through, if a little slow, and contains 2490 tools at time of writing. Most tools can be easily installed onto a private Galaxy instance like ours in a few minutes by an administrator.

To get started with Galaxy, take a look at our Galaxy intro.

R

As noted in our R intro, if your data is in tabular format and you want to learn from it, the answer is almost certainly to use R. R is a fantastic tool to munge, summarize, extract, plot, statistically analyze, and otherwise learn from your data. R is also well-known for its ability to produce beautiful visualizations using packages like ggplot2. If data visualization is in your future, R might be a good bet.

To get started with R, take a look at our R intro.

Shell scripting

If you only want to run command line programs, a shell script is the ticket. Shell scripts are the command-line equivalent of Galaxy’s workflows, and behave almost exactly as if you were sitting at the keyboard typing the commands in manually. In fact, getting started with shell scripting is about that easy – open up an editor, type one command per line, and then have your computer run it. We’ll be covering shell scripting a bit later, but this short tutorial is a good place to start.

Python

If your data is not in tabular format, or if you want to do some analysis and run some external programs, Python is a gorgeous choice. Python is a solid and popular general-purpose language that many developers fall back to when there isn’t a compelling reason to use a more specialized tool. It is also arguably one of the best-documented modern languages; tutorials and references can all be found on docs.python.org. (Just make sure you’re reading the docs for the right version of python; there were major changes to the language between versions 2 and 3.)

Like R, Python has a central repository with thousands of packages you can “import” into your own programs: e.g., BioPython for biological data and analysis; pymc for Bayesian statistical modeling; or Sage for “elementary and advanced, pure and applied” mathematics.

xkcd comic about Python

Of course, there are plenty of more mundane, utilitarian packages that are nonetheless invaluable for day-to-day things.

What if I get stuck?

Everyone does! And we are here to help– come by for help during office hours. These projects all have active online communities. Biostars has a section devoted specifically to Galaxy users, while Stack Overflow is a great resource for questions about Python, R, shell scripts, or just about any other programming language.

Edit this page on GitHub

brought to you by fredhutch.io