PolicyStat's Dev Blog

Using Vim as Your Git Mergetool

Resolving merge conflicts has always been a pain point for our team and it’s mostly been a tooling problem.

Not-good-enough Solutions

Until now, we’ve been stuck making tradeoffs between good 3-way diff tools and good code editors.

On one side, tools like Meld are great at showing you the differences, and helping you resolve your text conflict. You lose all your nice editing shortcuts and syntax highlighting though, and you’re dropping out of your normal vim-based workflow.

On the other side, you can use vim or vimdiff, but vim isn’t a merge tool and vimdiff feels clunky for the 3-way merges that git gives you.

Inspiration: fugitive.vim and git mergetool

Today, I read a post by Flaviu Simihaian explaining how to resolve git merge conflicts with Vim. Eureka! You get to stay in your familar editor while keeping all of the niceties of a 3-way diff!

A little nicer via git mergetool

I have one more step to add to make the normal workflow a little easier. Git actually gives you the ability to define arbitary merge tool command for use when resolving merge conflicts, so let’s use that plus the power of vim to change your merge work flow to a simple

$ git merge
$ git mergetool

Configuring mergetool

We’re goingto use the magic of VIM startup commands combined with git-mergetool to automate loading the appropriate diffs inside vim with fugitive.vim’s 3-way merging awesomeness.

Install fugitive.vim

The fugitive.vim plugin gives you access to git commands and information from inside vim. It has the handy ability to do a 3-way git-style diff.

Assuming you have pathogen.vim installed (you should), just run:

$ cd ~/.vim/bundle
$ git clone git://github.com/tpope/vim-fugitive.git

Configure gvim as your mergetool.

$ git config merge.tool gvim
$ git config mergetool.gvim.cmd 'gvim "+Gdiff" $MERGED'

Usage

Flaviu’s post has great instructions on usage, but the gist is to use ]c to navigate to conflicts and then :diffget //2 or :diffget //3 to choose the version to keep. :diffupdate fixes your whitespace issues and then :only lets you see what you’ve changed before you save.

Where Are the Magic Deployment Tools?

This is a cross-post + Edit of my comment on a blog post by @lusis on The Configuration Management Divide.

Deployment is Not a Solved Problem

From a developer’s perspective, CM is definitely not a solved problem in general. It’s solved for people using a PaaS provider like heroku (until you hit a limit) and it’s solved for developers in shops where someone has written a custom tool to make it easy (sounds like Deployinator at Etsy is an example). 

As a developer, I want two things from CM:

I want to test something locally and then programmatically capture that setup in a way I can easy replicate. 

Chef + Vagrant seems very close to solving this problem, but the reality is that adoption isn’t super high yet. It’s also surprisingly hard to get  developers (in my experience) to care about the advantages of writing/using/modifying Chef cookbooks/roles instead of Fabric or python scripts. In general though, I don’t see any gaps in Chef + Vagrant for solving this part of the problem. They’re both getting better and I see the day when the average new-hire experience at the average software company involves getting your laptop and running one command for an instant local mirror of the production setup.

git clone git://foo bar && cd bar && vagrant up

Boom. No wiki. No checklist. No crash errors because it’s been two months since the last time someone tried to set up a fresh machine.

I want to run one command to deploy and I want it to magically work and work very quickly. 

This is the part that I see as unsolved in a general case (again, outside certain PaaS providers and proprietary scripts within companies). Deploying in production doesn’t just mean “make these 8 nodes have this configuration,” but that’s the problem I see that’s well-solved with Chef Server, Puppet, Cloud Foundry (could be wrong here, only have 2 days of experience). The magic deploy that you need to get a python web app updated using AWS  involves:  

  1. Doing verification on the code to make sure I’m on a tagged, pushed version of the code in git and that I’ve passed the required tests
  2. Writing to a chat room to let people know a deploy is happening + random other one-off things unrelated to the actual deploy process
  3. Checking that all of the nodes I need actually exist, are bootstrapped and have basic monitoring (there are good tools for doing this part now)
  4. Gathering all of the data about nodes in your system to pass around like URLs to memcached servers (tools like Noah are good here I think)
  5. Pulling a test node out of the load balancer
  6. Configuring that node (chef solo is awesome for this now), running migrations, keeping things versioned for rollback, and all of the single-node deployment things that are well-addressed by tools
  7. Testing the health of that node, putting it back in the load balancer and making sure things don’t blow up
  8. Gradually repeating the out-of-loadbalancer, configure, test, back-in-load-balancer steps for the rest of the nodes

Then you’ve got lots of other details like creating dev/staging versions with different configs, pulling in live data for testing, versioning static media, rollback, forward-compatible schema migrations (and testing that), continuous integration, trade-offs in speed/flexibility in how much you put in your AMIs, monitoring, etc. Also, it should be fast or a developer will start avoiding extra deploys by batching them.

It seems like every production python web app (and I think Ruby, Node.js, PHP etc) needs to do all of these boringish things, but everyone has had to cobble together their own solution. I’m absolutely for unix-style single purpose tools, but if there’s something out there that aims to generally solve the deploy part of the configuration management problem, I’m missing it.

Everyone Deploys

Every company with a production web app has had to deal with these problems. In a startup, that usually means you start with all of the problems and only solve one at a time and only as you need to. Why are we all re-inventing this wheel and where are the open source projects trying to solve this problem?

Everyone should be able to:

foo deploy:live active up

Run MySQL From a RAM Disk in Ubuntu Linux

Here at PolicyStat, we heavily rely on unit tests, selenium tests and continuous integration to keep our quality up so that we can practice continuous delivery. That means we write a lot of tests and we run a lot of tests. A major source of frustration is the painfully bad MySQL performance during tests because of table creation overhead. One option we considered was to use the Memory storage engine in MySQL, but you lose some capabilities (blob and text columns) and you’re no-longer testing your actual database.

Yesterday, I finally bit the bullet and figured out how to configure MySQL to run on a ramdisk.

Results

RAMdisks are fast, but we really only saw big performance improvements when compared to an ext4 filesystem on a conventional hdd. Compared to solid state disks with XFS or ext3, the RAMdisk only gave marginal performance improvements

I ran our full nosedjango-based test suite of 1340 tests on our ec2-based Hudson slaves. One of the slaves had the RAMdisk fix and the other was just using standard ec2 ephemeral storage with an ext3 filesystem.

  • ext3- 4117 seconds
  • RAMdisk- 3650 seconds

The RAMdisk gave us an 11% performance improvement. Our test suite relies very heavily on fixtures, so this tells me that our test suite is basically CPU-bound. Results on ext4 would be more dramatic and so would results on I/O-bound test suites.

Caveats

These instructions worked on two different Ubuntu 10.04 machines, but I haven’t tried it on other distros/versions. If someone can try this on another version and let me know the results, I would love to update the instructions. Also, you should understand what a RAMdisk actually is before proceeding. The main points are that your data will be lost on restart and if you don’t actually have free RAM, you won’t get much of a benefit.

Instructions

After a lot of unfruitful googling, I found this stackoverflow answer with some simple instructions. They got me most of the way there, but didn’t quite work on my system.

The following instructions got a working MySQL instance running on a RAMdisk.

1 Stop MySQL

We’re going to be copying the raw mysql data files, and we need them in a consistent state.

$ sudo service mysql stop

2 Copy your MySQL data directory to the RAMdisk

By default, all of MySQL’s data is stored in /var/lib/mysql and that’s the folder that needs to be fast. Ubuntu has a RAMdisk located at /dev/shm by default, so we’re going to use that. We also want to preserve the permissions on the files so that MySQL can access them.

$ sudo cp -pRL /var/lib/mysql /dev/shm/mysql

3 Update your mysqld configuration

Now we need to tell mysqld to actually use our new data directory. This setting is “datadir” located in /etc/mysql/my.conf under the [mysqld] section. Change yours to:

# datadir = /var/lib/mysql
# Using a RAMdisk
datadir = /dev/shm/mysql

4 Update your apparmor profile

AppArmor is great for keeping programs isolated for security purposes, but it also means that seemingly-small changes can cause AppArmor to break your program. By default, the mysql-server install comes with an AppArmor profile that locks mysqld to a specific set of files. /dev/shm/mysql isn’t in the default profile (obviously), so we need to add it.

First, open /etc/apparmor.d/usr.sbin.mysqld with your favorite text editor:

$ sudo vim /etc/apparmor.d/usr.sbin.mysqld

Then add the following lines inside the “/usr/sbin/mysqld” section (between the braces):

/dev/shm/mysql/ r,
/dev/shm/mysql/** rwk,

5 Restart apparmor and MySQL

And if everything has gone well, we just need to restart our services and get on to our much-faster testing.

$ sudo service apparmor restart
$ sudo service mysql start

Results

If you’re using ext4 for your hard drive, you should see a *huge* performance improvement with a ramdisk. Christian’s single selenium testcase run went from several painful minutes to 17 seconds. ext4 is a bit more paranoid about ensuring that changes are actually flushed to disc versus ext3, so you see a very large performance hit doing database and table creation. That means the gains from a RAMdisk are more dramatic.

TODO: I have plans to create an upstart script to manage the process of copying data to the RAMdisk on every boot, but for now you’ll need to do that manually every reboot.

Troubleshooting

Not enough RAM (ibdata1 file is too big)

By default when using innodb, the /var/lib/mysql/ibdata1 file grows and grows, even after you delete all of your database. In our case, Christian’s ibdata1 file was >300MB for no particular reason. This stackoverflow question explains how to shrink your ibdata1 file

PolicyPad: Etherpad + Bring Your Own Editor

Real-time In-Browser Collaboration

Real-time collaborative document editing in a web browser is hard. Google docs can do it, EtherPad can do it, but can your app do it? PolicyPad is a tool allowing you to make it happen using your existing javascript-based editor. Thanks to the dedicated work of a small team of Rose-Hulman Institute of Technology students (primarily Jimmy Theis and Kevin Wells), we’ve put together a demo with github’d source demonstrating real-time collaboration using the standards-focused WYMeditor (What You Mean editor). 

PolicyPad is a library to help use any existing javascript-based editor in front of EtherPad to add real-time collaboration. This potentially allows your existing application, with all of your custom editor add-ons and integration, to add the goodness of Etherpad without the need to implement a complex server-side solution for diffing, version control, retrieval, etc. 

PolicyPad Architecture

PolicyPad architecture diagram

Future

Right now, the code is working well with WYMeditor, but we’d really like to add plugins for other editors, such as TinyMCE and CKEditor. We’d also like to fill the existing gaps where the default Etherpad editor provides functionality that we don’t yet mimick (real-time chat, stepping back through old versions, import/export).

Get Involved:

Today’s Amazon Ec2 EBS Outage in a Graph

munin disk throughput graph ebs

You can clearly see the sharp dropoffs in disk throughput as the EBS volume goes in and out of availability. This is a high-cpu medium instance in US East 1b with 1 10GB EBS volume attached.

Good news though, I learned a new euphemism for “Stuff is Broke”: Increased Latency

Defaulting on a mortgage:

We are experiencing increased latency affecting several housing-related financial obligations.

The Vietnam war:

We are currently investigating increased latency surrounding our police action.

Chernobyl:

We can confirm the existence of increased latency surrounding the separation of nuclear fallout from the surrounding wildlife.

Joking aside, I feel like this outage illustrates the upside of the cloud, contrary to some other accounts I’m reading. From what I can tell, Amazon experienced an outage in 2 of 4 availability zones (with degraded service in the others, presumably) within 1 of 5 regions. Datacenter outages happen, and sometimes they cluster. The alternative scenario where two of your co-location providers or two pieces of critical hardware goes down means you are 100% going to experience downtime. With ec2, moving your entire operations from affected datacenters x and y to unaffected a and b can literally be one command away. We’re all practicing infrastructure as code right?

In our case, we have application servers load-balanced across both 1b and 1d with an RDS multi-az master in 1b. The DB automatically failed over last night, and the load balancer automatically took the degraded 1b instances out of rotation as soon as they stopped responding. Unfortunately, the working application servers had a little trouble switching connections to the new master Database due to DNS caching (which we’ll be fixing). The takeaway though, what could have been a 12-hour (and counting) outage was measured in minutes instead because of the tools AWS makes availlable at low cost.

Of course, I still had a monitoring alert hit my cell at 4am and not sleeping is kind of a bummer, but I’ll take Mean Time to Recover + distributed risk over Mean Time Between Failures + concentrated risk any day. 

PHP to Django: How We Did It

Only Took me Four Months

Back in November I wrote about our slow journey from PHP to Django. Our startup went from a 100% PHP application to 100% Django over 22 months, with the two sides coexisting in production. We slowly made the conversion while also improving our product with new features, bug fixes and polish all around. From the response to that post, it seems like this might be a problem that other people have had or are having and at least a few people wanted more details.

So here I am putting code where my mouth is.

first django-php-bridge commit

This post is an introduction to the django-php-bridge project and a more technical discussion of how we made the transition.

Introducing Django-PHP-Bridge

The goal of django-php-bridge is to make it easier for PHP projects and Django projects to live side by side, passing authenticated users back and forth seemlessly.

I see four different cases where it makes sense to have Django and PHP projects living side by side:

  1. You want to convert your PHP app to Django, but you realize how *ahem* stupid it would be to live in a coding bubble for a few months to rewrite things. Your customers want you to innovate and your competitors won’t wait for you.
  2. You want to convert your Django app to PHP. I’m not sure why you would do this, but I know that someone somewhere has this need.
  3. You have legacy Django and/or PHP applications that you want to mash together for a better user experience.
  4. You have a real reason to build an application using two different technologies. 

We want to help make it easier to do all of those things by providing glue code, documentation and utilities. When we made the transition at PolicyStat, it was pretty rough finding any information and I feel like we wrote a lot of code that other people had already written, code that has since been re-written many times

Django < == > PHP Under the Hood

The integration has just a few major components:

  • Session serialization
  • Session storage
  • User schema

Session Serialization

The absolute core piece to the whole integration is django_php_bridge.backends.db, the Django session backend that speaks PHP’s serialization format. Armin Ronacher’s wonderful PHPSerialize library does the bulk of the heavy lifting here. Basically, we’re using the normal Django database-backed session backend, only we’re serializing all of the session data to the format that PHP expects. 

Session Storage

On the other side of that integration, we need to make sure that the PHP side knows how to grab and create session data in the DB table. Luckily, PHP has what are basically pluggable session backends. I submitted a stripped down version of the backend that we use in contrib: https://github.com/winhamwr/django-php-bridge/blob/master/contrib/php/djangoSession.class.php

User Schema

Now that we have both sides speaking the same language with regards to sessions, we need to make them both understand users. The key here is that you have to agree on the schema for your user data. In this case, I think it makes the most sense to follow Django’s built-in User model along with their Profile system because not doing so shuts you out of a lot of re-usable Django applications. 

Once you’ve decided to use the Django schema for users, you have a conversion script to write to take your existing schema and map it to Django’s. One wrinkle here is that you will probably need to move around password hashes, which could be a major pain. Luckily, Django’s password documentation explains the way the hash is stored and it’s stored in a very flexible manner. So if you weren’t using a salt or maybe were using a different hash algorithm, Django has your back and our example PHP user object has functions for hashing and unhashing django-style passwords.

Once you have your User and Profile objects converted over, you’ll need to change your PHP application to respect that new schema. Depending on what PHP framework you’re using and on how your PHP code is structured, this could be as simple as changing one class, or it could be very difficult. Regardless, I’ve included a very simple, not awesome example of what this might look like in PHP culled from our actual codebase: https://github.com/winhamwr/django-php-bridge/blob/master/contrib/php/user.class.php

Future Direction

I would absolutely love to see some contributions with examples of how to use the serialization backend with specific PHP frameworks like CakePHP and Symfony. One of the strengths of PHPs is the ability to rig things together in an ad-hoc way to build what you need. Unfortunately, that comes back to bite us when we’re trying to give instructions on how to do basically anything with a “PHP application” because that doesn’t really have much meaning from a code/structure/architecture perspective. The PHP frameworks generally solve that lack of structure and will hopefully allow us to build some PHP code that’s actually re-usable in the general case.

We’re no longer using any PHP at PolicyStat, and I don’t have much passion to write much more PHP in my spare time, but I’d very much like to help where I can with python code, documentation and the curation of PHP submissions. If you’ve already built this kind of Django to PHP integration using one of the PHP frameworks like CakePHP, I’d especially love to add your code to the project.

Pull Requests Wanted

I’d welcome any feedback as far as what other people solving this problem would like to see. What would be helpful? I’d also really like to accept contributions with documentaiton, utilities, examples and anything else that would make it easier for someone to build a Django project that lives side by side with a PHP project. Any ideas at all, feel free to ping me on twitter @weswinham and please fork django-php-bridge on github.

Using Django-mailer With Django-ses for Amazon SES Goodness

Amazon’s new Simple Email Service is pretty awesome.

Previously, there were basically two options:

  1. Pay out the wazoo per message
  2. Hassle with setting up a mail server and continuously wrangle with the email world to avoid spam filters

spam

With SES, you avoid the hassle of deliverability and you get 2k messages per day for free with 1 penny per hundred messages over that.

At PolicyStat, we use django-mailer to queue up emails for sending so that we don’t need to make an SMTP connection inside the request/response cycle. It also gives us nice logging and do-not-send abilities and we wanted to keep those. We also really wanted to use SES.

Luckily, as is usually the case with the Python and Django community, in the ~2 weeks between the SES announcement and the time we wanted to implement, the wonderful community around Boto had already grown SES support and a gentlemen named Harry Marr out of the UK had created a Django app for SES Email called django-SES.

The integration was painless for us thanks to Django 1.2’s email backend support. The problem we ran in to was that django-mailer was specifically designed for use with SMTP-based email sending. In the send_all() function used for sending all of the emails in your queue, there was code like:

except (socket_error, smtplib.SMTPSenderRefused, smtplib.SMTPRecipientsRefused, smtplib.SMTPAuthenticationError), err:
                mark_as_deferred(message, err)
                deferred += 1

This worked great for SMTP sending. If a message failed for any of the expected reasons, that message was deferred for later (in case of a temporary problem) and the function kept chugging through the rest of the queue. With django-ses and boto though, you don’t get SMTP errors. You get things that look like:

400 Bad Request
<ErrorResponse xmlns="http://ses.amazonaws.com/doc/2010-12-01/">
  <Error>
    <Type>Sender</Type>
    <Code>MessageRejected</Code>
    <Message>Address blacklisted.</Message>
  </Error>
  <RequestId>eb0e8eda-48c2-11e0-8b2e-91b9805ad73d</RequestId>
</ErrorResponse>

When this happens, the send_all function fails out, your message isn’t deferred and all emails after this one are effectively blocked. That’s bad.

Our fix, after experimenting with the bad idea of wrapping Boto exceptions in SMTP exceptions, was to simply use a bare except: statement for send_all(). Now SES errors don’t block our queue and we’re back to being happy. We’re using our django-ses compatible django-mailer fork in production right now.

Next step will be to find a way to automatically handle “Address blacklisted” messages, but that’s for another day. I’m just happy to stop our ghetto system of getting a pagerduty notification at 3am and manually rotating our SMTP user in production as we hit our quota. Thank you SES, boto, django-ses and django-mailer for taking back my sleep time.

Structure Aware Change Tracking

At PolicyStat, whenever we have written a chunk of code that seems like it might have widespread usefulness, we like to release it as open source. We have recently released HTML Tree Diff, a library for showing diffs between HTML documents in a structure-aware way. It is written in Python, and you can get the source code at GitHub, or install it from the Python Package Index.

We work with HTML documents every day, and we were disappointed that there was not an existing library to display “track-changes” style diffs between HTML documents. This code has been used in production since June 2009, and we’re excited to share it with the community.

Documents

Let’s say you have a document.  Very Important Documenttm. And some Very Important People are interested in what’s in the document. Now, as very important as these people are, they don’t have the time to read through the entire thing each time it gets updated, but they would like to know what exactly the changes are. That’s where diff comes in.

Let me show you an example:

Old New
old document new document

Someone has changed the jelly donut schedule! It’s been the same for years! How will we remember the new one? Everybody panic!

PANIC

Diffs

But wait, through the clever use of technology, you can calm the panic by showing them exactly what changed between the two documents:

inline diff

As you may know, this is a diff. It concisely shows what lines have changed between the successive versions. We can even fancy it up and use html styling to make it more readable:

Source Rendered
source doc rendered doc

HTML Documents

This works pretty well for files that are just flat text, but what if our Very Important Document is in HTML format? It turns out that PolicyStat has exactly this situation. We have tens of thousands of Very Important Documents that are stored in HTML format, and have multiple versions.

So let’s look at what happens when we try the same thing on an HTML document:

Source Rendered
source html rendered html

Disaster! That’s not even valid HTML! The Very Important People are now Very Angry!

HTML Diffs

What do we do about this? It turns out that this is not a trivial problem to solve. You have to consider that HTML is not flat like a text file, but actually a tree structure.

So, to create a diff between two HTML documents, the diff algorithm needs to be aware of the tree structure. There has been some research in this area, but none of it that we found was implemented in a practical way, with real-world usefulness.

To solve this, I wrote a library that does structure aware diffs of HTML trees. Here’s some example output:

Source Rendered
structure-aware diff HTML sturcture-aware diff rendered

The day is saved! Everyone shows up on time for jelly donuts, and the donut rebellion of 2011 is quelled!

Pingdom and Intermittent Timeouts With Mod_wsgi

Server Errors, but Only Sometimes

We’ve been getting occasional reports from our users and staff that on rare occasions, a page doesn’t load. Of course, the descriptions varied from “server error” to “error page” to “page timeout” to “page hanging” but that’s the nature of verbal communication. English is hard, but debugging intermittent errors is probably harder. Luckily, we use Pingdom to monitor our site and keep us informed of any problems, so we do have some data to refer to every time I hear of anyone having trouble. We run a check every minute against our login page to make sure that the welcome message and login form displays (among other checks). It’s been easy to just refer to that and be assured that we’ve actually had 100% uptime.

Pingdom Uptime

100% Is Only Sorta 100%

Looking back through the data again, I got a little bit suspicious. 100% seems too good to be true over 43,200 checks across the public internet, as you’d expect some packet somewhere to get dropped. After digging down to the detailed Pingdom report, I found out that Pingdom checks have another status beyond just “UP” or “DOWN.” If a check fails, another check follows it immediately to confirm, and the first failed check shows up in the detailed logs as “DOWN_UNCONFIRMED.” I comb through the logs and see that of the 43,200 checks in the last month, 47 of them have resulted in “DOWN_UNCONFIRMED” which is a bit more than 1 out of every 1000.

mod_wsgi, I’m Doing It Wrong

After combing through the various Nginx and Apache logs between all of our application servers, I find a rough pattern where around the time of a DOWN_UNCONFIRMED check I see one or more apache messages like this:

[Tue Dec 14 12:11:04 2010] [error] [client xxx.xxx.xxx.xxx] Premature end of script headers: deploy.wsgi

Ah ha!

We use django running on mod_wsgi inside Apache2 reverse proxied via Nginx and deploy.wsgi is our Django wsgi file. Google helps me find a lot of different causes of the “Premature end of script headers” error message and I eliminate several of them. Finally I locate a post by the eminantly-helpful Graham Dumpleton on the mod_wsgi google group where he mentions that whenever a WSGIDaemonProcesshits the maximum-requests limit, mod_wsgi will actually kill any requests currently being processed if they don’t finish within 5 seconds. Our intermittent hangs instantly made sense because we had a relatively aggressive maximum-requests value of 500 (a reminant of formerly running mysql, memcached and apache2/modwsgi all on the same server).

That meant that once every 500 requests, any users loading slow pages (reports, uploads, complex searches, etc) would get their request chopped off. It also explained why the failures were mostly clustered around our prime usage times because not only are page loads slightly slower when we’re under load, but the wsgi daemon processes will be getting more requests and thus restarting more frequently.

Django + Apache2/mod_wsgi + Nginx Are Not Magic

I’m a huge fan of mod_wsgi and how easy it is to administrate, but this just goes to show that even subtle configuration decisions can have an impact. Graham Dumpleton can’t read my mind, unfortunately.

Our solution was to increase the maximum-requests value to 10k. This should cut down the number of dropped requests by 20x. We’ve got quite a bit of extra RAM, so this seems like an easy fix.

munin memory usage

We’ll also be monitoring memory usage using Munin and if it appears that memory leaks are not actually a problem for our application, we’ll consider upping the maximum-requests value even further. Perhaps we can even eliminate maximum-requests entirely in favor of using our normal alerting and monitoring system to notify us in case mod_wsgi memory usage hits an unacceptable threshold.

It seems that maybe the mod_wsgi documentation could benefit from a quick blurb about this side-effect of maximum-requests, so I opened a mod_wsgi issue for a documentation enhancement.

PHP to Django: Changing the Engine While the Car Is Running

It’s pretty widely known that web frameworks like Django and Ruby on Rails are amazing. They give developers the ability to build applications quickly and reliably and there’s all sorts of hotness when it comes to tools and community and reusable applications. But what if your application isn’t using a web framework? Are you doomed to be outmaneuvered by your competitors?

It Takes a Choice

PolicyStat, like most ideas, was born out of a prototype. In our case, the prototype was written in PHP with the kind of architecture you’d expect from a prototype. There were 45 .php files in a folder with one file called class.main.php. No templates. No classes. No consistent database access. No MVC. No ORM. No tests.

First Commit

Once our prototype took off and we realized there was a strong market for policy management software done exactly the way we wanted to do it, we had a decision to make. Should we focus on improving our existing code base? Port things to one of the PHP web frameworks? Switch languages all together? As someone who believes the Single Worst Mistake a software company can make is rewriting from scratch, it was still obvious that moving to Django was in the best interest of our company (and my sanity). We decided to have our cake and eat it too.

Integration Version 1

After a lot of hard thinking, we realized that if we could somehow get a Django project to share sessions with our existing PHP prototype, we could keep both codebases active at the same time while we slowly ported functionality from PHP over to Django. Our integration process went something like this:

  1. Create models.py files and break them up by functionality in to separate apps using Django’s great legacy database support.
  2. Build a Django Session Engine using phpserialize to speak PHP’s serialization format.
  3. Duplicate the HTML templates spread out in various .php includes to Django’s superior Template Inheritance system.
  4. Separate PHP and Django based on URL paths via the Apache configuration (mod_php and mod_python at the time).

In the end, it only took a couple of weeks to get to a very basic level of integration. A vast majority of PolicyStat was still in PHP, but a couple of pages were served from Django. The key was that this change was seamless from a user’s perspective. We didn’t even need to explain to customers that anything was changing, because it didn’t matter from their end. We were able to continue delivering normal enhancements and bug fixes while doing the initial integration.

Toward 100% Django

Once you get the bare minimum level of integration, the hard, non-technical problem to solve is what you should do from there. On one end, you can shut down feature development and bug fixes for a few months while you attempt to port the remaining PHP portions bit by bit. On the other end, you can choose to be a dual-backend application forever and ever, amen.

We took the position that the PHP portions of our application were just pieces of code that needed refactoring. Once we made that decision our process naturally fell out; we already knew how to handle less-than-perfect code. When we added new features, they were done in Python and Django. When we needed to polish or otherwise fix functionality that lived in PHP, we looked at it just like you’d look at that super-ugly method you wrote a couple of years ago when you were in a hurry. If it was a very small change, we tended to just make the change in PHP. If it was a decent chunk of change, or the change would be easier in python, we first wrote unit tests on what we expected the behavior to be using the existing code as a guide, then we ported the code to Django and made the fix. During that time, we continued to deliver regular product updates.

22 Months Later

So finally, after 22 months of regular updates and massive product improvements (including a major architectural revampt to support multi-tenancy), we finally removed the last bit of PHP. It was a very, very, very happy day for the development team. To our customers though, it was just another update. That’s a good thing :)

last php removal

Notes

  • Django and Ruby on Rails are the best examples, but there are plenty of other great frameworks. The major decision is between using a solid web framework and not using one.
  • Yes there are reasons not to move your app to a web framework, but if you’re basing your business around an application, there aren’t many situations where your organization wouldn’t stand to benefit.
  • If anyone is interested, I’d be happy to do a followup post with a more technical look at the django session engine we used and maybe what our apache vhost file looked like.

Edit: Finally finished the follow-up post with technical details on migrating from php to django.