Software Carpentry @ ASU

As I mentioned in a previous post, this weekend I participated in my first Software Carpentrybootcamp. I had the pleasure of meeting the instructors and all the other helpers involved on Friday over pizza and beer, and in particular spent considerable time nerding out with Naupaka over all things vim and tidy data.

On Day 1, the topic of the morning was R. From what I understand, it was only recently that Software Carpentry decided to add R to their curriculum. Though I certainly learned plenty myself as I rushed about helping participants through small problems, there is a lot of good stuff (and plenty of potential) here.

R is an analysis tool first and a programming language second. This makes it a pragmatic choice for researchers but still a difficult one to teach. We used RStudio as the environment for introducing things, and this may have taken the edge off for the command-line averse. I didn’t get a chance to follow along with the exercises too much myself, but I’m particularly interested in learning more about the use of factors. If anyone has any resources for understanding what factors are, why they are useful, etc., I’d love to hear about them.

That afternoon we switched topics to the shell, or more specifically bash. The session started well enough, but given a distribution of participants that ranged from zero experience to intermediate, it proved difficult to please everyone. Also, as a note to anyone that does a bootcamp on the shell (or even any topic): give a motivation behind everything you demonstrate, both in broad and immediate scope. Although it can be obvious why a given trick is useful to an experienced user, the novice can often be left wondering what benefit this has compared to what they already do to get work done. This is especially true for the shell, since 90% of work done in the shell is moving around a file system, which most scientists already accomplish by clicking around a file browser. People have to see a clear advantage to adopt new practices.

Day 2 (today) started off with git. The instructors did well to put extensive focus on giving motivation for git’s use, given the feedback from the previous day’s session on the shell. This was especially important for git, since for many scientists it is considered to be non-essential in the toolbox of software things. Software Carpentry hopes to change this.

The gist: git is great for more than just code. Though originally produced to address the needs of the software community, as a version control system git works particularly well for paper writing, alone or with collaborators. And even though git is optimal when working with text-only files, it is perfectly happy to store away anything you like and version control it. Further, pushing the materials used to build a manuscript to a public repository host like GitHub is a great way to address the well-acknowledged problem of reproducibility in science.

During the afternoon we turned our focus to SQL, using an SQLite database as our toy case for running queries against. I was running out of steam by about this point, and I mainly focused on helping the folks on Windows to get sqlite3 working under Git Bash (not hard, but for beginners running around the filesystem is an expensive operation).

Overall, I’m happy I got to help put this on and I hope to be involved in a growing SC community here at ASU in the very near future. Assuming space is still available, I’m also hoping to attend Titus Brown’s instructor training in Davis this coming January.

Thanks to everyone involved! Looking forward to more!

— david

related links

social