Factory pattern with syntactic sugar

written by paul on May 7th, 2008 @ 08:53 PM

Dan Manges has a nice write-up on why the factory pattern is better than Rails fixtures: Rails: Fixin’ Fixtures with Factory. The factory is used to create valid objects for testing with default values for all of the fields. These objects can be used in tests without cluttering the test with attributes we do not care about.

On my current project, we like the factory pattern but favor a slightly better syntax. We wanted to replace:


Factory.create_paperboy

with


Paperboy.build!

We use the build method (for lack of a better name) to create test data. We are calling a class method on Paperboy in order to create an instance, which seems more consistent with object creation in ruby than calling a method on a Factory class:


Paperboy.new
Paperboy.create
Paperboy.build!

In addition to build! (which is like create!), we added a build method which creates the object without saving it. We put our code in a file called factory.rb (which we require in spec_helper.rb) that looks like:


module Factory

  def self.included(base)
    base.extend(self)
  end

  def build(params = {})
    raise "There are no default params for #{self.name}" unless self.respond_to?(self.name.underscore)
    new(self.send(self.name.underscore).merge(params))
  end

  def build!(params = {})
    obj = build(params)
    obj.save!
    obj
  end

  def customer
    {
      :first_name => "Joe",
      :last_name  => "Guy",
      :paperboy   => Paperboy.build,
    }
  end

  def newspaper
    {
      :customer => Customer.build,
      :headline => "Read all about it!",
      :paperboy => Paperboy.build,
    }
  end

  def paperboy
    {
      :first_name     => "Paper",
      :last_name      => "Boy",
      :delivery_route => "Main St Route" 
    }
  end  
end

ActiveRecord::Base.class_eval do
  include Factory
end

The build method uses the class name to find the default params, which are defined as a method per class. Then, it merges any user supplied params and creates the object. Now, when we see test code that looks like:


Paperboy.new :first_name => "some", :last_name => "person" 

we can replace it with:


Paperboy.build

or, if one of the fields is important for the test:


Paperboy.build :first_name => "joe" 

It is easy to swap the word “new” or “create” for “build” or “build!” and then delete the params that we do not care about.

Dynamically generating FlexUnit test suite

written by paul on April 28th, 2008 @ 10:55 PM

I have been working with Flex recently, which is a framework for building Flash applications. The standard testing framework for Flex is FlexUnit. A good example of FlexUnit can be found at How to use FlexUnit with FlexBuilder 2

One limitation of FlexUnit is that it does not have a way to dynamically build a test suite at runtime. All of the examples online have a manually created and maintained test suite that looks like:


package {
  import flexunit.framework.TestSuite;
  import com.foo.FooTest;
  import com.bar.BarTest;

  public class Suite {

    public static function createTestSuite() : TestSuite {

      var ts : TestSuite = new TestSuite();
      ts.addTestSuite(FooTest);
      ts.addTestSuite(BarTest);
      // Many more additions
      return ts;
    }
  }
}

We believe in constantly writing and refactoring tests, and we did not want to manually maintain this test suite. We first investigated reading the filesystem to get the list of available tests at runtime. However, since the tests run inside of flash (in a browser or standalone player), there is no access to the filesystem. Kent Spillner and I brainstormed and came up with a build time solution instead.

First, we wrote a ruby script that would read the filesystem and generate the test suite file. We called the script test_suite_generator.rb, and it looks something like:


#!/usr/bin/env ruby

require 'find'

search_path = File.dirname(__FILE__) + '/test/'

test_cases = []
Find.find search_path do |path|
  filename = File.basename(path)
  Find.prune if filename =~ /^\./
  test_cases << path.gsub("#{search_path}", '').gsub('.as', '').gsub('/', '.') if filename =~ /Test\.as$/
end

test_cases.sort!

File.open('test/Suite.as', 'w') do |file|
  file.puts <<EOF
// This file is generated by #{File.basename(__FILE__)}.
package {
  import flexunit.framework.TestSuite;
EOF
  test_cases.each {|tc| file.puts "  import #{tc};"}
  file.puts <<EOF

  public class Suite {

    public static function createTestSuite() : TestSuite {

      var ts : TestSuite = new TestSuite();
EOF
  test_cases.each {|tc| file.puts "      ts.addTestSuite(#{tc});"}
  file.puts <<EOF
      return ts;
    }
  }
}
EOF
end

This script uses Find to get all of the test files in the test folder (files that end in Test.as). For each file, it turns the path into a package (com/foo/HelloWorldTest.as -> com.foo.HelloWorldTest). Then, it writes a file called Suite.as into the test directory which imports each of these packages and then adds them to a TestSuite object.

We use Flex Builder for flex development, which is based on eclipse. This allows us to add a custom builder which will run our ruby script.

In Project -> Properties -> Builders, choose New -> Program. Type TestSuiteGenerator for the name, and browse for the ruby script in the Location field:

Then, on the “Build Options” tab, select “During auto builds” and “Specify working set of relevant resources:”

Finally, click on “Specify Resources…” and choose the folder where your tests live. Make sure that you don’t select the folder that includes Suite.as, or the generation of the file will kick off the builder again. If your tests are nested inside a “com” folder, then choose “com.”

Now, whenever we add or rename a test, Flex Builder runs test_suite_generator.rb and regenerates the Suite.as file. We don’t have to manually maintain anything.

Flying towards a hub

written by paul on March 12th, 2008 @ 07:48 PM

Most people who work for ThoughtWorks in the US travel for a living, so we spend a fair amount of time talking about travel. Lately, we have noticed a pattern on cancellations and formed a theory: Flights towards an airline hub are less likely to be canceled.

Most of our team currently flies United between Chicago and Newark. One member flies the same route on Continental. Chicago is a United hub, while Newark is a Continental hub. When there is bad weather on Thursdays, Continental will often cancel all of its flights to Chicago. United, however, generally makes it out. Likewise, on Sundays, United is more likely to cancel all of its flights to Newark, while Continental does not.

Our reasoning is that airlines are afraid that their planes will be stranded in non-hub cities. Continental does not want to send its planes to Chicago if it fears that they will not make it back (many of the flights bounce back and forth between these two cities). If the planes are already out, the hubs want them back, so flights towards the hub will probably make it out.

Taking this theory into account, it is better to fly an airline that has a hub in your home city. Flights out are more likely to be canceled, but at least you spend the night at home. Returning flights are less likely to be canceled, so you have a better chance of getting home.

Alphabetize schema.rb columns

written by paul on March 12th, 2008 @ 07:39 PM

I wrote previously about automated testing of database rollback scripts in rails. After running the rollback scripts, we verify our database schema by comparing the schema.rb file from before the upgrade with the one after rollback. The problem is that the columns for each table in schema.rb seem to appear in the order that they were created, not in alphabetical order. So, if a column is dropped and then re-added, it will move down to the bottom of the list. For example, we start with a table like:


  create_table "foo", :force => true do |t|
    t.string "first" 
    t.string "second" 
  end

Now, version 6 drops the “a” column, and the rollback adds it back. Then, the schema.rb file will look like:


  create_table "foo", :force => true do |t|
    t.string "second" 
    t.string "first" 
  end

In order to compare schema.rb files before and after upgrades, we wanted to alphabetize the column list. We did this by monkey patching the columns method on ActiveRecord::Base.connection. We only want to change the columns method when dumping the schema. We do not want to change the running application in any way. so we only run our monkey patch when calling the db:schema:dump task. Our solution looks like:


task :'db:schema:dump' => :'db:alphabetize_columns'

task :'db:alphabetize_columns' do
  class << ActiveRecord::Base.connection
    alias_method :old_columns, :columns unless self.instance_methods.include?("old_columns")

    def columns(*args)
      old_columns(*args).sort_by(&:name)
    end
  end
end

Reducing build time

written by paul on February 25th, 2008 @ 08:58 PM

A short build time is a critical element of continuous integration. I’ve been involved in a number of build improvements on my current project (both local and on the build server), and I thought I would share some of them. Using the tricks below, we cut our build time in half (and have more to go). Obviously, every build and project are different, so many of these may not apply to other situations.

Profile the build

We profiled our build before we made any changes. We found the slowest tasks using rake—trace, which prints out task timing. Furthermore, we watched the output from the “top” command while the builds were running on the build server to reveal bottlenecks.

Turn down logging

We noticed while watching “top” that kjournald was constantly running. This indicates a lot of disk activity. Our app logs to files and syslog while running in production, but this is unnecessary during builds. We turned off nearly all logging in test mode.

Make sure the build machine is only performing builds

We discovered that our build machine had become a bit of a playground for trying new things. This is fine in moderation, but some of these projects were not cleaned up. There were a lot of unnecessary processes running on the build machine. We killed everything that was unnecessary, and increased the nice value of processes that were not critical. For example, cruisecontrol.rb runs the web interface as a separate process from the builders. We increased the nice value of the web interface so building our projects would take precedence.

Parallelizaton

If you have more than one processor (or core), and the machine is running a giant set of tests, there is a good chance that the other processor is doing very little. We discovered huge gains by running tests in parallel. Projects like deeptest and selenium grid make this easy. We went a simpler route and run our functional and acceptance tests at the same time in different processes.

Build database from dumps

Our project is on release six, so we have six versions of the database to build. We ran ddl and dml for every release to build a version six database. Each time we release, we add a new version and the build gets a little longer. We started saving build time by dumping the database schema and data from the previous release. Now, when we want to build a version six database, we restore a version five dump and then build up from there.

Trim selenium suite

Selenium is a great testing tool, but it runs slowly. Opening new browsers and clicking through the site is slow. We looked at our acceptance test suite more carefully and trimmed it. Some of the excess was duplication that was covered in another test (or easily added to another test). Other logic was better tested at the functional level and did not need an acceptance test. We reduced our suite to a few long passes through the application, rather than many smaller tests.

Better hardware

Hardware is relatively cheap compared with developer time. It is worth investing in great hardware. That said, it does not reduce the need for the above improvements. Unfortunately, in many organizations, getting new hardware can be slow. Rather than wait a month, we can work on the build and see results today. And once the new hardware arrives, it will make things even faster.

If you have other build improvement strategies, please let me know in the comments.

ActiveRecord serialize only saves data

written by paul on February 20th, 2008 @ 09:44 PM

We ran into an interesting gotcha on our project the other day. We use serialize on ActiveRecord to save ruby objects to the database. This is described in Jay Fields Thoughts: Rails: ActiveRecord Serialize method.

Serialize uses YAML.dump and YAML.load to serialize/deserialize objects to strings. These methods only deal with the data of an object, not the methods. The objects we serialized used metaprogramming to dynamically define methods. When they were loaded from the database, they no longer had the new methods.

Here is a contrived example. The Foo class creates a foo method in the initialize:


class Foo
  def initialize
    class << self
      define_method :foo, lambda { 10 }
    end
  end
end

>> Foo.new.foo
=> 10

A dump of the Foo class has no knowledge of this foo method:


>> require 'yaml'
>> YAML.dump(Foo.new)
=> "--- !ruby/object:Foo {}\n\n" 

Therefore, the loaded version of Foo will not have the foo method:


>> YAML.load(YAML.dump(Foo.new)).foo
NoMethodError: undefined method `foo' for #<Foo:0xb7985af0>
        from (irb):16

In our case, we changed our code to store only the data from the domain object in the database (in columns). We recreate the domain object from these columns when we need it.

Handling nil in method calls

written by paul on February 17th, 2008 @ 12:14 PM

On my current project, we noticed common pattern when dealing with nil. We would often check an object to see if it was nil before calling a method on that object:


name = person ? person.name : nil

To reduce duplication, Patrick Farley and Ali Aghareza created a nil_or method which handles this. The above code becomes:


name = person.nil_or.name

The nil_or causes the expression to return nil if the target is nil. If not, the name method is called.

The code for nil_or looks like:


module ObjectExtension
  def nil_or
    return self unless self.nil?
    Class.new do
      def method_missing(sym, *args); nil; end
    end.new
  end  
end

class Object
  include ObjectExtension
end

The nil_or method returns self if self is not nil. If self is nil, it creates a new Object which eats all method calls and returns nil.

We use a fair amount of delegation on this project using forwardable, so Michael Schubert and Toby Tripp created a delegator which has the same effect. For example, you can replace this delegation:


class Person
  extend Forwardable
  def_delegator :@job, :title, :job_title
end

with this one:


class Person
  extend Forwardable
  def_delegator_or_nil :@job, :title, :job_title
end

This delegation is equivalent to this code:


class Person
  def job_title
    @job ? @job.title : nil
  end
end

The code for def_delegator_or_nil looks like:


module ForwardableExtension
  def def_delegator_or_nil(accessor, method, new_method = method)
    accessor = accessor.id2name if accessor.kind_of?(Integer)
    method = method.id2name if method.kind_of?(Integer)
    new_method = new_method.id2name if new_method.kind_of?(Integer)

    module_eval(<<-EOS, "(__FORWARDABLE_EXTENSION__)", 1)
      def #{new_method}(*args, &block)
        begin
          if #{accessor}.nil?
            nil
          else
            #{accessor}.__send__(:#{method}, *args,&block)
          end
        rescue Exception
          $@.delete_if{|s| /^\\(__FORWARDABLE_EXTENSION__\\):/ =~ s} unless Forwardable::debug
          Kernel::raise
        end
      end
    EOS
  end
end

module Forwardable
  include ForwardableExtension
end

Loading rails sessions manually

written by paul on February 8th, 2008 @ 04:46 PM

On my current project, we wanted to write some code to load a specific user’s session data (not the current user). This turned out to be a little trickier than we thought.

We use active_record_store for our sessions, so session data is stored in a sessions table in the database. In theory, we should be able to read the session with code like:


>> CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
ArgumentError: undefined class/module Foo
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:84:in `load'
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:84:in `unmarshal'
        from /usr/lib/ruby/gems/1.8/gems/actionpack-2.0.2/lib/action_controller/session/active_record_store.rb:122:in `data'
        from (irb):1

Unfortunately, if the session contains any custom classes, this code will fail. Behind the scenes, session data is stored as a Base64 encoded, Marshal dumped string. If there are classes in the dump that ruby does not know about yet, the Marshal.load will fail.

If we manually load the class, it will work:


>> Foo
=> Foo
>> CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
=> {:foo=>#<Foo:0xb7a5f3cc>, "flash"=>{}}

Our sessions contain a bunch of custom classes, and we did not want to manually load them. Since we knew rails handled this properly, we dug into the depths of rails and found this code in cgi_proceess.rb (Rails 2.0.2):


def stale_session_check!
  yield
rescue ArgumentError => argument_error
  if argument_error.message =~ %r{undefined class/module ([\w:]*\w)}
    begin
      # Note that the regexp does not allow $1 to end with a ':'
      $1.constantize
    rescue LoadError, NameError => const_error
      raise ActionController::SessionRestoreError, <<-end_msg
Session contains objects whose class definition isn\'t available.
Remember to require the classes for all objects kept in the session.
(Original exception: #{const_error.message} [#{const_error.class}])
end_msg
    end

    retry
  else
    raise
  end
end

This code assumes you pass in a block which loads the session. If an error is raised trying to load a class, it calls constantize on the class name, which forces rails to find and load the class. It does this repeatedly until all load errors have been resolved.

We can use this method now to load our session:


>> ActionController::CgiRequest.new(CGI.new).instance_eval do
?>   stale_session_check! do
?>     CGI::Session::ActiveRecordStore::Session.find_by_id(1).data
>>   end
>> end
=> {:foo=>#<Foo:0xb79f46a8>, "flash"=>{}}

This is not the prettiest code, but it allows us to get at the logic rails uses to load sessions.

Popup when leaving website

written by paul on January 29th, 2008 @ 07:45 PM

On my current project, we had a requirement to pop up a message to the user when they leave our site (e.g., close the browser window). We discovered a number of sites talking about a window.close event, but it is not supported in modern browsers (IE 6, Firefox 2, Safari 2). The closest event is an unload event which gets fired when the page is unloaded.

We tried using this event in a javascript file that is included on every page:


window.onunload = popup;

function popup() {
  alert('I see you are leaving the site');
}

Unfortunately, an unload event is fired when the user leaves the page for any reason (such as clicking on a link). We did not want to show the popup if the user clicked on a link which stayed on the site.

Our next idea was to add an onclick to every link which would turn off the popup. Then, if the user clicks a link, nothing happens. If they leave the site another way, they get the popup.

Toby Tripp had the great idea to add these onclicks dynamically in javascript. We use prototype, so the code looks like:


staying_in_site = false;

$$('a').each(function(link) {
  link.observe('click', function() {staying_in_site = true;});
});

window.onunload = popup;

function popup() {
  if(staying_in_site) {
    return;
  }
  alert('I see you are leaving the site');
}

This code creates a variable which determines whether or not to show the popup. Then, we find every link with $$(‘a’). This is a CSS selector which selects all of the a elements. We iterate over these links and add a listener which sets the staying_in_site flag to true. Now, our popup code checks this flag and knows whether the user is staying on the site.

It is important to include this javascript at the bottom of the file (and not in the head). The javascript must be executed after the page is written or the links will not exist yet.

This solution is not perfect, but it accomplishes most of what we need.

Update (1/31/08): Simon Stewart suggested a cleaner method in the comments:


staying_in_site = false;

Event.observe(document.body, 'click', function(event) {
  if (Event.element(event).tagName == 'A') {
    staying_in_site = true;
  }
});

window.onunload = popup;

function popup() {
  if(staying_in_site) {
    return;
  }
  alert('I see you are leaving the site');
}

Instead of attaching an event listener to every link, we can attach a click listener to the whole body. Now, any click will call our function, which checks to see if you clicked on a link. Since we are not iterating over existing links, this javascript does not have to be below the links in the page.

Remove files that are not in subversion

written by paul on January 22nd, 2008 @ 09:11 PM

One of the great features of version control is that I can easily revert back to a known good state. I can do this in Subversion with the following command:

% svn revert -R .

However, if I have new files that are not in Subversion, this command will not delete them. Here is a fun ruby one liner to remove those files:

svn st | ruby -ne 'File.delete($_[1..-1].strip) if $_.match(/^\?/)'

This command loops over svn status and deletes all files from lines that start with ?.

PostgreSQL allows duplicate nulls in unique columns

written by paul on January 11th, 2008 @ 11:31 AM

It seems strange, but duplicate null values do not violate unique constraints in PostgreSQL.

Inserting the same non-null value twice in a unique column fails as expected:

# create table test (
  a varchar unique
);

# insert into test values(1);
INSERT 0 1

# insert into test values(1);
ERROR:  duplicate key violates unique constraint "test_a_key" 

However, the same is not true for null:

test=# insert into test values(null);
INSERT 0 1

test=# insert into test values(null);
INSERT 0 1

# select * from test;
 a
---
 1

(3 rows)

I think this is misleading, but PostgreSQL says that it is following the SQL standard: Unique Constraints.

Update (1/16/08): Pramod Sadalage showed me that Oracle actually behaves just like PostgreSQL. I’m not sure why I was seeing different behavior, but I could not reproduce the problem.

Command line clipboard access

written by paul on January 11th, 2008 @ 11:16 AM

It is possible to control the clipboard (copy and paste) from the command line in linux and OSX. In linux, the command is xsel, and in OSX, pbcopy/pbpaste.

For example, someone IMed me and asked for a subversion URL. Normally, I would type “svn info” in a terminal, use the mouse to select the URL, and press Ctrl-Insert to copy. Instead, I can just run this command:

svn info | grep URL | xsel --clipboard

Then, I can Alt-Tab back to the IM window and press Ctrl-v. There is no wasted time reaching for the mouse.

The—clipboard argument is required. Without it, xsel acts on the currently selected text in the terminal rather then the clipboard.

In OSX, the equivalent command is:

svn info | grep URL | pbcopy

pbcopy/pbpaste come with OSX. In debian flavors of linux, the following command will install xsel:

sudo apt-get install xsel

xsel can also be used to paste the clipboard. For example:

% echo ls | xsel --clipboard

% xsel --clipboard
ls

% `xsel --clipboard`
bin    dev   initrd      lost+found    mnt   root  sys  var

Zero bug releases

written by paul on November 30th, 2007 @ 05:00 PM

One of the companies I worked for had a rule which dictated zero bugs in each release. Any bug that was found during the formal QA process had to be fixed before the release could go into production. This sounds like a good idea at first, however, it is fraught with problems:

  1. Some reported bugs have very low impact. For example, a label might be misaligned a few pixels. Or maybe a certain bug is only seen by internal users. Fixing bugs takes time away from new features, which may be more important. New features drive application development, and focusing on minor bugs slows down the project.
  2. The requirement to fix every bug led people to fear reporting new bugs. They knew that we would have to spend time fixing it before cutting the release. We did not want the person finding the bugs to decide whether or not to report the bug. All bugs should be reported, and the business should prioritize and decide which ones are worth fixing.

These problems stemmed from the fact that fixing bugs once the software reached formal QA was expensive. Each bug must first be fixed by developers. Then, the tester had to verify the fix (possibly by pushing a new build into a signoff or local QA environment). Once the fix was verified, we had to release a new version of the software in order to promote it to the formal QA environment. Finally, the formal QA had to verify the fix. This entire process took at least half a day, and could take much longer.

Obviously, this process was a point of pain. The time and people involved meant that we should only fix bugs worth fixing, and the business sponsors had the final say.

Announcing pulse-0.2.0

written by paul on November 14th, 2007 @ 09:47 PM

I just released a new version of pulse. Pulse adds an action to your rails project that can be used for external health checking. The most common use is by a http proxy such as haproxy.

The main new feature of version 0.2.0 is that the pulse URL is now specified in routes.rb. For example, you might like /heartbeat better than /pulse. The URL is set up in config/routes.rb like:


map.pulse 'pulse_url'

For example, config/routes.rb might look like:


ActionController::Routing::Routes.draw do |map|
  map.connect ':controller/:action/:id'
  map.pulse 'pulse'
end

Pulse is now a rails plugin instead of a gem, so check out the new installation instructions.

Pulse no longer monkey patches rails in the manner described in Add routes with a rails plugin or gem. Now, it mixes a Pulse::Routes module into ActionController::Routing::RouteSet::Mapper. This allows the “map.pulse” syntax that is shown above.

The code (lib/routes.rb) looks like:


module Pulse
  module Routes
    def pulse(path)
      connect path, :controller => 'pulse', :action => 'pulse'
    end
  end
end

ActionController::Routing::RouteSet::Mapper.send :include, Pulse::Routes

Announcing new gem: pulse

written by paul on October 26th, 2007 @ 10:54 AM

I created a new rubygem called pulse. From the README:

Pulse adds an action to your rails project that can be used for external health checking. The most common use is by a http proxy such as haproxy. A proxy can be configured to hit your servers at a specified URL to see if the servers are healty. By default, they use the ”/” URL, but in many sites, this can have side effects like creating a session. Pulse adds a ”/pulse” URL which has no session and no logging.

The gem adds a route using the code described here: Add routes with a rails plugin or gem

Check out the gem and let me know what you think.

Options:

Size

Colors