Development Hacks to Prevent Mistakes

This post is cross-posted at Development Hacks to Prevent

Bugs are an inevitable part of software development. We do our best to
write higher quality software, but we never fully escape releasing bugs
into production. At Braintree, we deal with payments, so we’re extra
sensitive to bugs. Therefore, we’ve made some interesting choices on how
to fight back. In some cases, we’ve done some crazy hacks in the name of
code safety. I’ll cover the most interesting ones in this post.

Scoping queries to a merchant

Braintree allows merchants to accept payments online. Almost everything
we do is scoped to a merchant. Each merchant is distinct, and carries
all of its own data. For example, one merchant’s vaulted credit cards,
transaction history, and more should not be visible to other merchants.
We are very serious about this isolation, but it’s easy to introduce
bugs that would violate it. For example:

class CustomerController < ApplicationController  
  def show
    @customer = Customer.where(:token => params[:token]).first

This innocuous looking code has a big problem. If the token in the URL
is for the wrong merchant, the code will happily show it to the end
user. The simple fix is to use something like this instead (assuming
we’ve already looked up the user’s merchant):

@customer = @merchant.customers.where(:token => params[:token]).first

The problem is that this is an easy mistake to make. Since we know
people are fallible, and are going to make mistakes, we added a monkey
patch to prevent these unscoped finds:

>> Customer.where(:token => 'Paul')
RuntimeError: #finds must be scoped on Customer

>> merchant = Merchant.find(1)
>> merchant.customers.where(:token => 'Paul')
We whitelist models (such as Merchant and User), and if there are cases where we really need to perform an unscoped find, we also have a backdoor:

>> Customer.allow_unscoped_find.where(:token => 'Paul')

This explicitly calls out that we're intending to do an unscoped find,
rather than using one by accident.

The code looks like this:

Scoped find hook

The code above makes sure that we scope ActiveRecord methods to a
merchant, but it doesn't cover all cases. For example, it doesn't stop
code like this:

Customer.find_by_sql(["SELECT * FROM customers WHERE token = ?", "Paul"])  

We write a decent amount of custom SQL in our application for
performance, and we wanted to make sure these cases were safe as well.
Therefore, we have a similar hack called the scoped find hook that
checks ActiveRecord objects as they load and makes sure they all have
the correct merchant. Since our URLs are scoped by merchant
(/merchants/<merchant_id>/customers/<token>), we can check each object loaded from the database against the merchant specified in the URL:

>> RequestContext.with_merchant(Merchant.find(1)) { Customer.find(1) }
ScopedFindHook::ScopeError: Customer cannot return objects scoped by the incorrect merchant. Got 6, expected 1.  

In this case, RequestContext holds the merchant from the URL,
populated by the ApplicationController.

The code looks like this:

Recurring billing consistency

Braintree has a recurring billing system that takes care of automatic
billing on a schedule. While it seems simple at first, recurring billing
is actually quite complicated. For example, a subscription can be past
due for several months, accruing a balance. Then, when the credit card
is updated, or the merchant chooses to wipe the balance, we need to
reactivate the subscription and get it back into the monthly billing
cycle. This involves updating the balance, paid_through_date,
next_billing_date, billing_period and more.

These complex operations are thoroughly tested, but we've still had bugs
creep in. This is another case where we wanted runtime safety, so we
added a consistency check. When saving a Subscription, we run a series
of consistency checks, and if any of them fail, we alert and abort. Some
of these checks are:

  • Billing period end date is after the billing period start date
  • If the Subscription is past due, then the failure count is less than the maximum number of retries
  • The next_billing_date is one billing period greater than the last billing date

If a check fails, we can investigate the cause. Sometimes, our
consistency checks miss an edge case and we update them. Other times,
we've caught a legitimate bug. Then, we fix the bug, deploy, and let the
next run of recurring billing pick up the subscription.

The implementation is very simple:

Sanity Specs

Besides runtime checks, we also have a lot of development time checks.
We call these sanity specs, and the goal is to check for developer
mistakes during development. For example, we have a public_id column on
most of our tables that represents the externally facing identifier
(instead of using our internal database ids). Since we almost always look objects up by this id, we want to make sure it's always indexed. We
can codify this requirement into a spec that will fail if we forget the

require 'spec_helper'

describe 'Sanity Specs' do  
  describe "database tables" do
    they "have a unique index on the public_id column" do
      indexes = ActiveRecord::Base.connection.select_values("SELECT relname FROM pg_class WHERE relkind = 'i'")
      ActiveRecord::Base.connection.tables.each do |table|
        klass = ActiveRecord::Base.descendants.detect { |k| k.table_name == table }

        if klass.has_public_id?
            indexes.grep(/index_#{table}_on.*public_id/).size.should eql(1), "#{table} does not have an index on public_id"

Here are some other sanity specs we have:

  • Database tables have non-null created_at/updated_at columns
  • Spec files and app files are named consistently
  • Crontabs have a blank line at the end (to work around an old Ubuntu bug)
  • Every controller defines authorization_data (used for our authorization framework)
  • Migrations do not rename columns, since this cannot be done while the app is running *
  • Migrations do not drop columns unless they are marked in the app as deleted *

* See Ruby Conf Australia: High Availability at Braintree
for why these are dangerous operations

Our rough policy is that if you get bitten by something writing code,
and others are likely to fall into the same trap, we write a sanity spec
to prevent future problems.