It’s now been several weeks since I returned from my first SciPy, and I realized I had yet to actually write about it on my own blog. Well, late is better than never. If nothing else, the time delay has allowed the dust to settle for a more coherent post. :)
Short version: SciPy was perhaps the most fun I’ve had at a conference.
Why? Several reasons. First, the audience is very diverse: most are working in science but many others are working in tech and software development. This is in contrast to many of the conferences I’ve attended which are primarily scientists working in my field, very few of which spend much of their time developing and maintaining software.
Second, the conference itself is different than most in terms of its basic structure, composed of:
- 2 days of tutorials
- 3 days of main conference
- 2 days of sprints
Since this was my first time attending, I decided to get the full SciPy experience and attend the whole week. I’m glad I did.
The two tutorials I most enjoyed were those on machine learning with scikit-learn and using bokeh to create interactive plots. Although I had played briefly with both packages before, these sessions offered a more thorough view to what can be done with them, and most importantly, how. In particular, for scikit-learn I was struck at how well thought-out its built-in parameter sweeps and cross-validation machinery was, and it’s given me ideas for how to build similar robustness machinery for other packages, e.g. mesa. I’m not entirely sure yet where I might make use of bokeh, since ultimately plots I produce need to be exported to a printable form. However, for publishing data visualizations to the web it’s a great option.
The conference itself sported three simultaneous tracks of talks, a poster session, and daily lightning talks. A few of my favorites:
- Dask: Out-of-Core Numpy and Pandas through Task Scheduling: a talk on dask, which provides data structures that can do complex out-of-core operations with an API similar to numpy and pandas.
- HDF5 is Eating the World: a talk by Andrew Collette (author of h5py) on ways to wield HDF5 effectively, as well as exciting new developments in HDF5 itself.
- Deep Learning: Tips from the Road: a talk by Kyle Kastner on deep learning, its uses, and its limitations. This is a great, short intro to neural networks.
- Agent Based Modeling in Python with Mesa: introduction of mesa, a new package that fills a hole in the python ecosystem: a package for building agent-based simulations.
- VisPy: Harnessing the GPU for Fast, High Level Visualization: a great talk by Luke Campagnola on the capabilities of VisPy to generate complex visualizations in real time using the GPU and OpenGL.
There are many more where that came from, including:
- Time Series Analysis for Network Security
- Statistical Thinking for Data Science
- xray: ND Labeled Arrays and Datasets
- RESTful HDF
- Teaching with IPython/Jupyter Notebooks and JupyterHub
- Accelerating Python with the Numba JIT Compiler
Each day of the main conference featured a keynote, all three of which are worth watching.
- Data Science at the New York Times by Chris Wiggins
- My Data Journey with Python by Wes McKinney
- State of the Tools by Jake VanderPlas
Since it’s impossible to attend everything, I’m extremely thankful all of these talks can be viewed directly on YouTube. Check out the full playlist for plenty more.
The last two days featured sprints. For the uninitiated: these are long blocks of time during which you can work on issues in a codebase collaboratively and in-person with others. At SciPy, however, mostly these are just a great way to get involved in new projects and work alongside some of the core developers. I’m particularly interested in machine learning applications to simulation work, so I split my time between working on scikit-learn and mesa. Although I only ended up submitting a couple PRs to scikit-learn over the course of the weekend, I got a good sense of the structure of the packages and had memorable discussions with the developers. The only problem now is figuring out where I’m going to devote my (rather limited) time outside of graduate school to open source projects that aren’t mdanalysis related. The upside is there are really no bad choices.
Besides all the new tools and technical developments SciPy made me aware of, what I value most from the conference are the people that I met and the connections I made with them. The scientific Python community includes some of the most intelligent and passionate people I’ve ever met. It was a pleasure to spend a week with a few of them.
— david