4 minute read

This post is cross-posted at Development Hacks to Prevent Mistakes.

Bugs are an inevitable part of software development. We do our best to write higher quality software, but we never fully escape releasing bugs into production. At Braintree, we deal with payments, so we’re extra sensitive to bugs. Therefore, we’ve made some interesting choices on how to fight back. In some cases, we’ve done some crazy hacks in the name of code safety. I’ll cover the most interesting ones in this post.

Scoping queries to a merchant

Braintree allows merchants to accept payments online. Almost everything we do is scoped to a merchant. Each merchant is distinct, and carries all of its own data. For example, one merchant’s vaulted credit cards, transaction history, and more should not be visible to other merchants. We are very serious about this isolation, but it’s easy to introduce bugs that would violate it. For example:

class CustomerController < ApplicationController
  def show
    @customer = Customer.where(:token => params[:token]).first
  end
end

This innocuous looking code has a big problem. If the token in the URL is for the wrong merchant, the code will happily show it to the end user. The simple fix is to use something like this instead (assuming we’ve already looked up the user’s merchant):

@customer = @merchant.customers.where(:token => params[:token]).first

The problem is that this is an easy mistake to make. Since we know people are fallible, and are going to make mistakes, we added a monkey patch to prevent these unscoped finds:

>> Customer.where(:token => 'Paul')
RuntimeError: #finds must be scoped on Customer

>> merchant = Merchant.find(1)
>> merchant.customers.where(:token => 'Paul')

# We whitelist models (such as Merchant and User), and if there are cases where we really need to perform an unscoped find, we also have a backdoor:
>> Customer.allow_unscoped_find.where(:token => 'Paul')

This explicitly calls out that we’re intending to do an unscoped find, rather than using one by accident.

The code looks like this: scoped_find.rb

Scoped find hook

The code above makes sure that we scope ActiveRecord methods to a merchant, but it doesn’t cover all cases. For example, it doesn’t stop code like this:

Customer.find_by_sql(["SELECT * FROM customers WHERE token = ?", "Paul"])

We write a decent amount of custom SQL in our application for performance, and we wanted to make sure these cases were safe as well. Therefore, we have a similar hack called the scoped find hook that checks ActiveRecord objects as they load and makes sure they all have the correct merchant. Since our URLs are scoped by merchant (/merchants/<merchant_id>/customers/<token>), we can check each object loaded from the database against the merchant specified in the URL:

>> RequestContext.with_merchant(Merchant.find(1)) { Customer.find(1) }
ScopedFindHook::ScopeError: Customer cannot return objects scoped by the incorrect merchant. Got 6, expected 1.

In this case, RequestContext holds the merchant from the URL, populated by the ApplicationController.

The code looks like this: scoped_find_hook.rb

Recurring billing consistency

Braintree has a recurring billing system that takes care of automatic billing on a schedule. While it seems simple at first, recurring billing is actually quite complicated. For example, a subscription can be past due for several months, accruing a balance. Then, when the credit card is updated, or the merchant chooses to wipe the balance, we need to reactivate the subscription and get it back into the monthly billing cycle. This involves updating the balance, paid_through_date, next_billing_date, billing_period and more.

These complex operations are thoroughly tested, but we’ve still had bugs creep in. This is another case where we wanted runtime safety, so we added a consistency check. When saving a Subscription, we run a series of consistency checks, and if any of them fail, we alert and abort. Some of these checks are:

  • Billing period end date is after the billing period start date
  • If the Subscription is past due, then the failure count is less than the maximum number of retries
  • The next_billing_date is one billing period greater than the last billing date

If a check fails, we can investigate the cause. Sometimes, our consistency checks miss an edge case and we update them. Other times, we’ve caught a legitimate bug. Then, we fix the bug, deploy, and let the next run of recurring billing pick up the subscription.

The implementation is very simple: subscription.rb

Sanity Specs

Besides runtime checks, we also have a lot of development time checks. We call these sanity specs, and the goal is to check for developer mistakes during development. For example, we have a public_id column on most of our tables that represents the externally facing identifier (instead of using our internal database ids). Since we almost always look objects up by this id, we want to make sure it’s always indexed. We can codify this requirement into a spec that will fail if we forget the index:

require 'spec_helper'

describe 'Sanity Specs' do
  describe "database tables" do
    they "have a unique index on the public_id column" do
      indexes = ActiveRecord::Base.connection.select_values("SELECT relname FROM pg_class WHERE relkind = 'i'")
      ActiveRecord::Base.connection.tables.each do |table|
        klass = ActiveRecord::Base.descendants.detect { |k| k.table_name == table }

        if klass.has_public_id?
            indexes.grep(/index_#{table}_on.*public_id/).size.should eql(1), "#{table} does not have an index on public_id"
          end
        end
      end
    end
  end
end

Here are some other sanity specs we have:

  • Database tables have non-null created_at/updated_at columns
  • Spec files and app files are named consistently
  • Crontabs have a blank line at the end (to work around an old Ubuntu bug)
  • Every controller defines authorization_data (used for our authorization framework)
  • Migrations do not rename columns, since this cannot be done while the app is running *
  • Migrations do not drop columns unless they are marked in the app as deleted *

* See Ruby Conf Australia: High Availability at Braintree for why these are dangerous operations

Our rough policy is that if you get bitten by something writing code, and others are likely to fall into the same trap, we write a sanity spec to prevent future problems.

Updated: