Deployment is Not a Solved Problem
From a developer’s perspective, CM is definitely not a solved problem in general. It’s solved for people using a PaaS provider like heroku (until you hit a limit) and it’s solved for developers in shops where someone has written a custom tool to make it easy (sounds like Deployinator at Etsy is an example).
As a developer, I want two things from CM:
I want to test something locally and then programmatically capture that setup in a way I can easy replicate.
Chef + Vagrant seems very close to solving this problem, but the reality is that adoption isn’t super high yet. It’s also surprisingly hard to get developers (in my experience) to care about the advantages of writing/using/modifying Chef cookbooks/roles instead of Fabric or python scripts. In general though, I don’t see any gaps in Chef + Vagrant for solving this part of the problem. They’re both getting better and I see the day when the average new-hire experience at the average software company involves getting your laptop and running one command for an instant local mirror of the production setup.
git clone git://foo bar && cd bar && vagrant up
Boom. No wiki. No checklist. No crash errors because it’s been two months since the last time someone tried to set up a fresh machine.
I want to run one command to deploy and I want it to magically work and work very quickly.
This is the part that I see as unsolved in a general case (again, outside certain PaaS providers and proprietary scripts within companies). Deploying in production doesn’t just mean “make these 8 nodes have this configuration,” but that’s the problem I see that’s well-solved with Chef Server, Puppet, Cloud Foundry (could be wrong here, only have 2 days of experience). The magic deploy that you need to get a python web app updated using AWS involves:
- Doing verification on the code to make sure I’m on a tagged, pushed version of the code in git and that I’ve passed the required tests
- Writing to a chat room to let people know a deploy is happening + random other one-off things unrelated to the actual deploy process
- Checking that all of the nodes I need actually exist, are bootstrapped and have basic monitoring (there are good tools for doing this part now)
- Gathering all of the data about nodes in your system to pass around like URLs to memcached servers (tools like Noah are good here I think)
- Pulling a test node out of the load balancer
- Configuring that node (chef solo is awesome for this now), running migrations, keeping things versioned for rollback, and all of the single-node deployment things that are well-addressed by tools
- Testing the health of that node, putting it back in the load balancer and making sure things don’t blow up
- Gradually repeating the out-of-loadbalancer, configure, test, back-in-load-balancer steps for the rest of the nodes
Then you’ve got lots of other details like creating dev/staging versions with different configs, pulling in live data for testing, versioning static media, rollback, forward-compatible schema migrations (and testing that), continuous integration, trade-offs in speed/flexibility in how much you put in your AMIs, monitoring, etc. Also, it should be fast or a developer will start avoiding extra deploys by batching them.
It seems like every production python web app (and I think Ruby, Node.js, PHP etc) needs to do all of these boringish things, but everyone has had to cobble together their own solution. I’m absolutely for unix-style single purpose tools, but if there’s something out there that aims to generally solve the deploy part of the configuration management problem, I’m missing it.
Every company with a production web app has had to deal with these problems. In a startup, that usually means you start with all of the problems and only solve one at a time and only as you need to. Why are we all re-inventing this wheel and where are the open source projects trying to solve this problem?
Everyone should be able to:
foo deploy:live active up