Feb 262008
 

A short build time is a critical element of continuous integration. I’ve been involved in a number of build improvements on my current project (both local and on the build server), and I thought I would share some of them. Using the tricks below, we cut our build time in half (and have more to go). Obviously, every build and project are different, so many of these may not apply to other situations.

Profile the build

We profiled our build before we made any changes. We found the slowest tasks using rake—trace, which prints out task timing. Furthermore, we watched the output from the “top” command while the builds were running on the build server to reveal bottlenecks.

Turn down logging

We noticed while watching “top” that kjournald was constantly running. This indicates a lot of disk activity. Our app logs to files and syslog while running in production, but this is unnecessary during builds. We turned off nearly all logging in test mode.

Make sure the build machine is only performing builds

We discovered that our build machine had become a bit of a playground for trying new things. This is fine in moderation, but some of these projects were not cleaned up. There were a lot of unnecessary processes running on the build machine. We killed everything that was unnecessary, and increased the nice value of processes that were not critical. For example, cruisecontrol.rb runs the web interface as a separate process from the builders. We increased the nice value of the web interface so building our projects would take precedence.

Parallelizaton

If you have more than one processor (or core), and the machine is running a giant set of tests, there is a good chance that the other processor is doing very little. We discovered huge gains by running tests in parallel. Projects like deeptest and selenium grid make this easy. We went a simpler route and run our functional and acceptance tests at the same time in different processes.

Build database from dumps

Our project is on release six, so we have six versions of the database to build. We ran ddl and dml for every release to build a version six database. Each time we release, we add a new version and the build gets a little longer. We started saving build time by dumping the database schema and data from the previous release. Now, when we want to build a version six database, we restore a version five dump and then build up from there.

Trim selenium suite

Selenium is a great testing tool, but it runs slowly. Opening new browsers and clicking through the site is slow. We looked at our acceptance test suite more carefully and trimmed it. Some of the excess was duplication that was covered in another test (or easily added to another test). Other logic was better tested at the functional level and did not need an acceptance test. We reduced our suite to a few long passes through the application, rather than many smaller tests.

Better hardware

Hardware is relatively cheap compared with developer time. It is worth investing in great hardware. That said, it does not reduce the need for the above improvements. Unfortunately, in many organizations, getting new hardware can be slow. Rather than wait a month, we can work on the build and see results today. And once the new hardware arrives, it will make things even faster.

If you have other build improvement strategies, please let me know in the comments.

Feb 212008
 

We ran into an interesting gotcha on our project the other day. We use serialize on ActiveRecord to save ruby objects to the database. This is described in Jay Fields Thoughts: Rails: ActiveRecord Serialize method.

Serialize uses YAML.dump and YAML.load to serialize/deserialize objects to strings. These methods only deal with the data of an object, not the methods. The objects we serialized used metaprogramming to dynamically define methods. When they were loaded from the database, they no longer had the new methods.

Here is a contrived example. The Foo class creates a foo method in the initialize:

class Foo
  def initialize
    class << self
      define_method :foo, lambda { 10 }
    end
  end
end
>> Foo.new.foo
=> 10

A dump of the Foo class has no knowledge of this foo method:

>> require 'yaml'
>> YAML.dump(Foo.new)
=> "--- !ruby/object:Foo {}\n\n"

Therefore, the loaded version of Foo will not have the foo method:

>>YAML.load(YAML.dump(Foo.new)).foo
NoMethodError: undefined method `foo' for #<Foo:0xb7985af0>
        from (irb):16

In our case, we changed our code to store only the data from the domain object in the database (in columns). We recreate the domain object from these columns when we need it.

Feb 172008
 

On my current project, we noticed common pattern when dealing with nil. We would often check an object to see if it was nil before calling a method on that object:

name = person ? person.name : nil

To reduce duplication, Patrick Farley and Ali Aghareza created a nil_or method which handles this. The above code becomes:

name = person.nil_or.name

The nil_or causes the expression to return nil if the target is nil. If not, the name method is called.

The code for nil_or looks like:

module ObjectExtension
  def nil_or
    return self unless self.nil?
    Class.new do
      def method_missing(sym, *args); nil; end
    end.new
  end
end
 
class Object
  include ObjectExtension
end

The nil_or method returns self if self is not nil. If self is nil, it creates a new Object which eats all method calls and returns nil.

We use a fair amount of delegation on this project using forwardable, so Michael Schubert and Toby Tripp created a delegator which has the same effect. For example, you can replace this delegation:

class Person
  extend Forwardable
  def_delegator :@job, :title, :job_title
end

with this one:

class Person
  extend Forwardable
  def_delegator_or_nil :@job, :title, :job_title
end

This delegation is equivalent to this code:

class Person
  def job_title
    @job ? @job.title : nil
  end
end

The code for def_delegator_or_nil looks like:

module ForwardableExtension
  def def_delegator_or_nil(accessor, method, new_method = method)
    accessor = accessor.id2name if accessor.kind_of?(Integer)
    method = method.id2name if method.kind_of?(Integer)
    new_method = new_method.id2name if new_method.kind_of?(Integer)
 
    module_eval(&lt;&lt;-EOS, "(__FORWARDABLE_EXTENSION__)", 1)
      def #{new_method}(*args, &#38;block)
        begin
          if #{accessor}.nil?
            nil
          else
            #{accessor}.__send__(:#{method}, *args,&#38;block)
          end
        rescue Exception
          $@.delete_if{|s| /^\\(__FORWARDABLE_EXTENSION__\\):/ =~ s} unless Forwardable::debug
          Kernel::raise
        end
      end
    EOS
  end
end
 
module Forwardable
  include ForwardableExtension
end
Feb 082008
 

On my current project, we wanted to write some code to load a specific user’s session data (not the current user). This turned out to be a little trickier than we thought.

We use active_record_store for our sessions, so session data is stored in a sessions table in the database. In theory, we should be able to read the session with code like:

>> CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
ArgumentError: undefined class/module Foo
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:84:in `load'
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:84:in `unmarshal'
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:122:in `data'
        from (irb):1

Unfortunately, if the session contains any custom classes, this code will fail. Behind the scenes, session data is stored as a Base64 encoded, Marshal dumped string. If there are classes in the dump that ruby does not know about yet, the Marshal.load will fail.

If we manually load the class, it will work:

>> Foo
=> Foo
>> CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
=> {:foo=>#<Foo:0xb7a5f3cc>, "flash"=>{}}

Our sessions contain a bunch of custom classes, and we did not want to manually load them. Since we knew rails handled this properly, we dug into the depths of rails and found this code in cgi_proceess.rb (Rails 2.0.2):

def stale_session_check!
  yield
rescue ArgumentError => argument_error
  if argument_error.message =~ %r{undefined class/module ([\w:]*\w)}
    begin
      # Note that the regexp does not allow $1 to end with a ':'
      $1.constantize
    rescue LoadError, NameError => const_error
      raise ActionController::SessionRestoreError, <<-end_msg
Session contains objects whose class definition isn\'t available.
Remember to require the classes for all objects kept in the session.
(Original exception: #{const_error.message} [#{const_error.class}])
end_msg
    end
 
    retry
  else
    raise
  end
end

This code assumes you pass in a block which loads the session. If an error is raised trying to load a class, it calls constantize on the class name, which forces rails to find and load the class. It does this repeatedly until all load errors have been resolved.

We can use this method now to load our session:

>> ActionController::CgiRequest.new(CGI.new).instance_eval do
?>   stale_session_check! do
?>     CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
>>   end
>> end
=> {:foo=>#<Foo:0xb79f46a8>, "flash"=>{}}

This is not the prettiest code, but it allows us to get at the logic rails uses to load sessions.