RailsConf Presentation

written by paul on June 27th, 2009 @ 03:17 PM

This post is late, but the slides from our RailsConf presentation are online:

Rails in the Large:How We’re Developing the Largest Rails Project in the World

Useful unix tricks - part 3

written by paul on February 20th, 2009 @ 02:04 PM

Here is part 3 of Useful unix tricks and Useful unix tricks – part 2.

!! is the previous command in the shell history

It is pretty common to want to rerun the previous command, possibly with something new on the beginning or end. !! is that command in the history. For example:


% tail foo                          
tail: cannot open `foo' for reading: Permission denied

% sudo !!                           
sudo tail foo
hello world

As you can see, I forgot to sudo the first command. Now, I want to rerun it with a sudo at the front, so I can just do “sudo !!” and press enter. The shell will print out the command it is running, followed by whatever it would print normally.

Tail multiple files at once

The tail command can take multiple files, and it will show the output of each one. You can combine this with the -f flag, and tail will intersperse the output of each file in real time. This is incredibly handy for looking at log files. For example, we can tail both the apache and rails logs to see the requests:


==> log/production.log <==

Processing MephistoController#dispatch (for 127.0.0.1 at 2009-02-20 13:33:31) [GET]
  Parameters: {"action"=>"dispatch", "path"=>["2008", "7", "19", "capistrano-with-pairing-stations"], "controller"=>"mephisto"}
Completed in 784ms (View: 0, DB: 260) | 200 OK [http://www.pgrs.net/2008/7/19/capistrano-with-pairing-stations]

==> /var/log/apache2/access.log <==
127.0.0.1 - - [20/Feb/2009:13:33:31 -0600] "GET /2008/7/19/capistrano-with-pairing-stations HTTP/1.1" 200 16049 "http://www.pgrs.net/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6 Ubiquity/0.1.5" 

As you can see, tail prints ==> <== to show which file the output is for.

Use vim -b to show nonprintable characters

Sometimes a file will have nonprintable characters, such as windows line breaks. Most editors won’t show them, but you can use “vim -b” to see and edit them. The -b flag tells vim to use binary mode. For example, here is a file with windows line endings:


% cat foo.txt 
Hello
World

% vim -b foo.txt
Hello^M
World^M
^M

As you can see, the vim binary mode can see the line endings whereas cat cannot.

** is a recursive wildcard in zsh

You can use the recursive wildcard ** in zsh to do complex matching. For example, let’s say that you want to search all ruby files in the current project for the string RAILS_ENV. Normally, you would do something like:


% find . -name '*.rb' | xargs grep RAILS_ENV
./config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
./test/test_helper.rb:ENV["RAILS_ENV"] = "test" 

In zsh, you can accomplish the same with a much simpler command:


% grep RAILS_ENV **/*.rb
config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
test/test_helper.rb:ENV["RAILS_ENV"] = "test" 

The wildcard **/*.rb recursively matches any files that end in .rb, so there is no need for a find command.

If there are a lot of files, you will occasionally get the error:


% grep RAILS_ENV **/*.rb
zsh: argument list too long: grep

This means that the **/*.rb match returned too many arguments to handle. In this case, you can use echo and xargs to get the job done, which is still simpler than the find command:


% echo **/*.rb | xargs grep RAILS_ENV

find -X will show bad filenames

It is pretty command to run find and then pass the arguments into xargs. However, if any filenames contain spaces or quotes, xargs will fail. You can use find -X to find any paths that will fail. find will warn on these paths and then skip them:


% find -X .
.
find: ./filename with spaces: illegal path

If you want to use xargs with these files, use the -print0 option to tell find to use a NUL character instead of a space, and xargs -0 to tell xargs to parse on NUL instead of space:


% find . -print0 | xargs -0 echo
. ./filename with spaces

cd – will return to the previous folder

Passing – into the cd command will return you to the last folder you were in:


/tmp% pwd
/tmp

/tmp% cd ~

~% cd -
/tmp

/tmp% 

Use ctrl+z and kill %1 to kill a process that will not die

Sometimes, you run a command and pressing ctrl+c will not kill it. When that happened, I use to open up another terminal window to kill -9 the process until someone showed me the following trick:


% sleep 1000
^Z
zsh: suspended  sleep 1000
% kill -9 %1
% 
[1]  + killed     sleep 1000

Pressing ctrl+z suspends the process and returns you to a terminal prompt. Then, kill -9 %1 sends the kill -9 signal to job #1, which is our suspended process.

pwdx shows the working directory of a process

It can be really useful to see the working directory of a running process. For example, you can see which release a ruby process is running:


% sudo pwdx 23961
23961: /var/www/myapp/releases/20081231200733

Unfortunately, I haven’t found pwdx for the mac. If anyone knows how I can install it, please let me know.

Use sh -x to debug shell scripts

If you want to know what commands a shell script runs, run it with the -x flag. For example, say we have a shell script with two echos. Compare the output with and without the -x flag:


% sh foo.sh 
hello
world

% sh -x foo.sh
+ echo hello
hello
+ echo world
world

% zsh -x foo.sh
+foo.sh:1> echo hello
hello
+foo.sh:2> echo world
world

As you can see, the -x flag shows which command is being run. zsh takes it a step farther and shows the script and line number as well.

sysctl replaces /proc on macs

The /proc filesystem is a great way to find out about a linux machine. For example, you can “cat /proc/cpuinfo” to find out how many processors are on the box. However, macs don’t have /proc. You can use sysctl instead. The -a flag prints out all keys and values:


% sysctl -a
kern.ostype = Darwin
kern.osrelease = 9.6.0
kern.osrevision = 199506
kern.version = Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386
...

You can also just get a single value with the -n flag. For example, this command will print out the number of cpu cores:


% sysctl -n hw.ncpu
2

Automerging now in rake_commit_tasks

written by paul on February 20th, 2009 @ 01:15 PM

The rake_commit_tasks plugin now supports automatically merging changes from branch to trunk. I describe the feature and the use case at Automatically merge changes from branch to trunk, although the merging code now uses “svn merge” instead of “svn diff” in order to keep svn mergeinfo.

Basically, if you branch to release code and then fix a bug on the branch, the change will automatically be merged over to the trunk when you run a “rake commit.” Just set PATH_TO_TRUNK_WORKING_COPY to the location of the trunk checkout in your Rakefile.

If you are curious, you can check out the commit at github.

Flight delays application overhaul

written by paul on January 10th, 2009 @ 05:43 PM

In Flight delay information for United flights, I talked about an application I wrote to show United flight delays over time. I have now completely rewritten that application to allow comparison of multiple flights on one graph.

People that travel for a living know that early morning flights tend to be less delayed than evening flights. In the morning, the planes are usually already at the airport, so there is no chance of an incoming flight delay. There are no lines of planes waiting to take off yet, so the time between leaving the gate and getting into the air tends to be a lot less.

The difference in these times can be dramatic. Here is a report comparing an early morning flight with an evening flight from Newark to Chicago (two heavily delayed airports): Flight Delays

The report shows a table of min, max, and median delays:

Flight Min Delay Median Delay Max Delay
Flight 655 (EWR -> ORD) on Thursday at 07:38 PM -14 40 239
Flight 635 (EWR -> ORD) on Thursday at 05:58 AM -26 0 157


And here are the two graphs shown in the report above:

As you can see from the first graph, day by day, flight 655 (departing around 7:38 PM) is almost always more delayed than flight 635 (departing around 5:58 AM).

The second graph shows a histogram. You can see that flight 635 is clustered more heavily to the left (-40 to 20) which shows that it is generally between 40 minutes early and 20 minutes late. Flight 655 is much more spread out to the right, which shows that it has far more delays. On one day, it was over 220 minutes late!

Strange behavior with define_method and the wrong number of arguments

written by paul on December 31st, 2008 @ 02:15 PM

I noticed the other day that methods defined using define_method have very strange behavior when given the wrong number of arguments. For example, here is a class with a bunch of methods defined using define_method:


class Foo
  define_method :no_args do
    p "no args" 
  end

  define_method :one_arg do |one|
    p one
  end

  define_method :two_args do |one, two|
    p one
    p two
  end
end

Now, if we call no_args with an argument, it will silently ignore the argument:


>> Foo.new.no_args(1)
"no args" 
=> nil

However, if we have a method that expects one argument but receives either none or more than one, we get a warning:


>> Foo.new.one_arg
./foo.rb:6: warning: multiple values for a block parameter (0 for 1)
    from (irb):3
nil
=> nil

>> Foo.new.one_arg(1,2,3)
./foo.rb:6: warning: multiple values for a block parameter (3 for 1)
    from (irb):2
[1, 2, 3]
=> nil

In the second case, it took all three arguments and passed them as an array into the method expecting one argument.

It gets even stranger with a method that expects two arguments. Now, we actually get errors:


>> Foo.new.two_args
ArgumentError: wrong number of arguments (0 for 2)
    from (irb):2:in 'two_args'
    from (irb):2

>> Foo.new.two_args(1,2,3)
ArgumentError: wrong number of arguments (3 for 2)
    from ./foo.rb:10:in 'two_args'
    from (irb):3

I’m not sure why a one argument method gives a warning while a two argument method gives an error. Clearly, define_method is very different from using def.

Mephisto with Phusion Passenger

written by paul on December 22nd, 2008 @ 11:43 AM

I recently upgraded my blogging software, Mephisto, from 0.7.3 to 0.8.1. One thing I noticed is that they moved the cached files from public to a cache subfolder containing the site. For example, on a new installation, the cached index page is in public/cache/unusedfornow.com/index.html.

Mephisto writes a cached page for every page visited. This means that any subsequent requests for this page can be served directly by apache from the cached file rather than going through the whole rails stack (all the way down to the database). This is much faster and uses less memory.

I run my blog in Apache with Phusion Passenger. The problem with this new cache location is that Passenger only looks in public for cached files. This means that the cached pages are ignored and every request is being served by Rails. After searching google and working some mod_rewrite magic, I came up with the following solution. Here is the Apache virtual host configuration for my blog:


<VirtualHost *:80>
    ServerName pgrs.net
    ServerAlias www.pgrs.net

    DocumentRoot /var/www/mephisto-0.8.1/public

    RailsAllowModRewrite on
    RewriteEngine On

    # Rewrite / to index.html
    RewriteRule ^/$ /index.html [QSA] 

    # Rewrite /some_page to /some_page.html
    RewriteRule ^([^.]+?)/?$ $1.html [QSA]

    # If cached file exists, serve it and stop processing
    RewriteCond %{DOCUMENT_ROOT}/cache/unusedfornow.com%{REQUEST_FILENAME} -f
    RewriteRule ^(.*)$ /cache/unusedfornow.com$1 [L]

    ErrorLog /var/log/apache2/pgrs-error.log
    CustomLog /var/log/apache2/pgrs-access.log combined
</VirtualHost>

The first 3 lines are standard Phusion Passenger configuration: Deploying a Ruby on Rails application. Then, I turn on mod_rewrite. The first two sets of mod_rewrite configuration cascade and turn the request into what the filename will look like. So / becomes /index.html, and /2008/10/29/deploying-trunk-or-tags-with-capistrano becomes /2008/10/29/deploying-trunk-or-tags-with-capistrano.html.

The final set checks if this file exists under /var/www/mephisto-0.8.1/public/cache/unusedfornow.com (the -f flag), and if it does, tells apache to serve this file. The [L] tells mod_rewrite that this is the last rule, so it should stop processing now. If the file does not exist, the request falls through mod_rewrite and Passenger picks it up and serves it through Rails.

I verified that this works by looking at the response headers in Firefox (Tools -> Page Info -> Headers) of any given blog page. The first time, there is a “X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.3” header. Once I refresh, the X-Powered-By header is gone since the request never makes it to Passenger. Apache is once again doing the hard work, and Rails is only used when the request is new or dynamic (such as searching).

Deploying trunk or tags with capistrano

written by paul on October 28th, 2008 @ 08:34 PM

On my current project, we use capistrano for all of our deployments. In the simplest case, you tell capistrano the URL of your repository, and then you deploy by performing a checkout from this repository:


set :repository,  "http://www.example.com/svn/myproject/trunk" 

However, putting this line in the capistrano recipe only lets you deploy from trunk. We needed the ability to deploy either the trunk or a tag of our choice. We generally deploy the trunk to development servers and the latest tag to staging and production servers.

We started out with something more complicated, but with the help of Jamis Buck on the capistrano mailing list, we came up with the following solution:


set :repository_root, "http://www.example.com/svn/myproject" 
set(:tag) { Capistrano::CLI.ui.ask("Tag to deploy (or type 'trunk' to deploy from trunk): ") }
set(:repository) { (tag == "trunk") ? "#{repository_root}/trunk" : "#{repository_root}/tags/#{tag}" }

This deploy script will prompt the user to enter either a tag name or the word trunk. It will then use that variable to set the repository to the correct path. The output of a deploy will look like:

% cap deploy
  * executing `deploy'
...
  * executing `deploy:update'
 ** transaction: start
  * executing `deploy:update_code'
Tag to deploy (or type 'trunk' to deploy from trunk): trunk
  * executing "svn checkout -q  -r2210 http://www.example.com/svn/myproject/trunk /var/www/myproject/releases/20081029012754 && (echo 2210 > /var/www/myproject/releases/20081029012754/REVISION)" 
...

Capistrano evaluates variables lazily. It will only fetch the repository variable if it needs it, which will then fetch the tag variable, which will then prompt the user. Therefore, if you run a command that does not require the repository, it will not prompt. For example, running the following command will not prompt the user:


cap deploy:restart

Next, we created a convenience rake task to deploy the trunk without prompting:


namespace :deploy do
  task :trunk do
    sh "cap -s tag=trunk deploy" 
  end
end

This rake task sets the tag variable on the command line. Therefore, capistrano will not need to evaluate the set(:tag) command and will deploy the trunk without prompting.

Finding nonprintable characters with a test

written by paul on September 30th, 2008 @ 07:26 PM

Our current application includes a lot of static content created by content editors. They check in static HTML files, and we include these files in various parts of the application. The problem is that they sometimes copy and paste from applications such as Outlook or Word, which can introduce unprintable characters into the application. These characters show up strangely on the website.

After this happened a couple of times, we decided to write a test to ensure that we would always catch the unprintable characters:


class NonPrintableCharactersTest < Test::Unit::TestCase
  def test_for_non_printable_characters_in_content
    assert_equal "", `find #{RAILS_ROOT}/content -name '*.html' | xargs grep -n '[^[:space:][:print:]]'`
  end
end

We use find to get a list of all of the html files in the content folder. Then, we pipe this to grep, using the regular expression

'[^[:space:][:print:]]'
which matches anything except spaces or printable characters. The output of this test looks like:


Loaded suite test/non_printable_characters_test
Started
F
Finished in 0.86005 seconds.

  1) Failure:
test_for_non_printable_characters_in_content(NonPrintableCharactersTest) [test/non_printable_characters_test.rb:5]:
<""> expected but was
<"/some/path/to/content/tmp.html:48:character �</span></p>\n">.

1 tests, 1 assertions, 1 failures, 0 errors

The failure message shows the file and line with the character, so it is easy to fix.

Testing page caching with SpiderTest

written by paul on September 12th, 2008 @ 03:10 PM

The website I’m currently working on is similar to an online brochure. The data on the site changes hourly, but every user sees the same thing. As a result, we decided to use page caching to dramatically speed up the site. Once a page is visited, the html is written out to disk and all subsequent requests are served by apache. The setup of this approach is detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching Tutorial).

Setting up caching was easy, but we wanted to ensure that we did not make any mistakes. All pages should be cached, since any miss will result in a much higher load on our rails application. I’ve written previously about our internationalization test (Improved internationalization test) which spiders the site (using SpiderTest) looking for non localized text. Since we were already visiting every page, it seemed like a good place to add a check for page caching. Spidering the site again would make our test suite too long.

The consume page method is called for every page that is visited by the spider. We expanded the implementation by adding a call to assert_page_is_cached:


def consume_page(html, url)
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
    assert_page_is_cached(url)
  super
end

def assert_page_is_cached(url)
  path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
  page = path.ends_with?(".html") ? path : "#{path}.html" 
  assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})" 
end

We also had to add new lines to our setup to turn on caching (since it is normally off in test mode):


def setup
  FileUtils.rm_rf ActionController::Base.page_cache_directory
  ActionController::Base.perform_caching = true
end

Since we run this test as its own suite, the test is totally isolated from other tests. There is no need to implement a teardown.

The full test, including the internationalization testing from before looks like:


require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    FileUtils.rm_rf ActionController::Base.page_cache_directory
    ActionController::Base.perform_caching = true
    blank_out_localization
    blank_out_html_escape
  end

  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        "" 
      end
    end
  end

  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        "" 
      end

      alias :h :html_escape
    end
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
      assert_page_is_cached(url)
    super
  end

  def redirect?(html)
    html.include?("<body>You are being")
  end

  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end

  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end

  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end  

  def assert_page_is_cached(url)
    path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
    page = path.ends_with?(".html") ? path : "#{path}.html" 
    assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})" 
  end
end

Capistrano dry run

written by paul on September 5th, 2008 @ 07:35 PM

I submitted a patch to Capistrano to add a “—dry-run” option (or -n for short). This flag causes capistrano to print out all of commands it will run without actually running them. It is an easy way to see what the cap task will do to your servers before you run it.

My patch was accepted and released as part of Capistrano 2.5.0. You can read more about the new features at:

http://capify.org/2008/8/29/capistrano-2-5-0

and see the details of my commit at github:

http://github.com/jamis/capistrano/commit/7279a3858e2bcebe84735223d5f8b4397c4ad85b

Improved internationalization test

written by paul on August 29th, 2008 @ 12:31 AM

I wrote previously about how we test the internationalization of our website in Testing internationalization language files. Basically, we generate a blank language file with all of the values for all of the labels set to blank. We switch the site to this language, and then we spider the site looking for text.

Over the past couple of months, we have improved our internationalization test and removed some of the existing limitations.

Manually marking nonlocalizable content

One of the limitations of the approach detailed in the previous article is that we had to manually mark content on the page that should not be internationalized by adding a class to the html:


<span class="nonlocalizable"><%= @building.address %></span>
The basis of our new test is the idea that all text on the page is one of two types:
  1. Labels and static text that live in the language files, which are inserted into the page using the GLoc method l()
  2. Text that the application produces, which should be html escaped using the h() method in the views or helpers

Therefore, if we intercept both of these types of text, we can find anything that is not localized or escaped.

Our new test setup looks like:


def setup
  blank_out_localization
  blank_out_html_escape
end

def blank_out_localization
  GLoc::InstanceMethods.class_eval do
    alias :old_l :l
    def l(symbol, *arguments)
      "" 
    end
  end
end

def blank_out_html_escape
  ERB::Util.class_eval do
    alias :old_html_escape :html_escape
    def html_escape(s)
      "" 
    end

    alias :h :html_escape
  end
end

We redefine the l() method to return an empty string, so anything that is localized will no longer show up on the page.

The h() or html_escape() methods are used to escape strings for the web (for example, converting ‘<’ into ’&lt;’). We also redefine these methods to return empty strings. Now, all text on the webpage should be blanked out.

We then spider the site as before, which walks every page and checks for non blank text.

It is possible to restore the l() and h() methods in the teardown:


def teardown
  restore_html_escape
  restore_localization
end

def restore_html_escape
  ERB::Util.class_eval do
    alias :html_escape :old_html_escape 
  end
end

def restore_localization
  GLoc::InstanceMethods.class_eval do
    alias :l :old_l
  end
end

However, I think it is safer to run this test in its own test suite in a separate ruby process. That way, the l() and h() monkey patching cannot accidentally affect other tests:


namespace :test do
  Rake::TestTask.new(:'internationalization' => ["environment", "load_test_data"]) do |t|
    t.libs << "test" 
    t.pattern = "test/acceptance/internationalization_test.rb" 
    t.verbose = true
  end

  Rake::TestTask.new(:'acceptance' => ["environment", "load_test_data"]) do |t|
    t.libs << "test" 
    t.pattern = FileList["test/acceptance/**/*_test.rb"].exclude("test/acceptance/internationalization_test.rb")
    t.verbose = true
  end
end

Now, we no longer need to mark any content as nonlocalizable. If the test fails, we either forgot to add a label to the language file, or we forgot to escape the text in the page:


<%= l(:name_label) %>

or

<%= h(@building.address) %>

Redirects

We noticed that Rails would send redirects as:


<html><body>You are being <a href="http://www.example.com/some/new/location">redirected</a>.</body></html>

The http://www.example.com URL was tripping up SpiderTest, so we removed that part of each URL. Furthermore, we skip our page checking on redirect pages and assets:


def consume_page(html, url)
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
  super
end

def redirect?(html)
  html.include?("<body>You are being")
end

def asset?(url)
  File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end

Alt and title attributes

We discovered with the original test that we were not testing alt and title attributes on the page. For example, if you hover over a link, it will show the title. We also want these strings internationalized, so we added them to the test with the following code:


assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'

def assert_attribute_does_not_contain_words body, url, attribute
  body.search("//*[@#{attribute}]") do |element|
    assert_does_not_contain_words element.get_attribute(attribute), url
  end
end

Better error messages

We noticed that if you accidentally forget to internationalize a string like “Please enter your username,” the test would fail with a message of “Found text that was not in the language file: Please.” We thought it would be better to show the full string, so we replaced the regex:


/\w+/

with


/[A-Za-z]([A-Za-z]| )*/

The second one matches all word characters or spaces, so it will pick up the entire phrase.

Final result

The final test looks like:


require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    blank_out_localization
    blank_out_html_escape
  end

  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        "" 
      end
    end
  end

  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        "" 
      end

      alias :h :html_escape
    end
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def redirect?(html)
    html.include?("<body>You are being")
  end

  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end

  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end

  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end  

end

These modifications have improved the quality of the internationalization test, and this test has been very useful at catching text that we forget to internationalize.

Switching users during a capistrano deploy

written by paul on August 5th, 2008 @ 09:15 PM

We have a complicated deployment on my current project which includes running several commands as a different user from the main deployment user. Normally, this wouldn’t be a problem since the sudo method provides an option called ‘as’:


task :run_command_as_another_user do
  sudo "whoami", :as => "another_user" 
end

Unfortunately, we do not have sudo access to our servers. Instead of using sudo, we wrote a new method called with_user which will execute a block as a different user:


task :try_another_user do
  with_user("another_user", "another_password") do
    run "whoami" 
  end
end

The with_user method will set the user and password back to the original values once the block is complete. For example:


task :whoami do
  set :user, 'original'
  set :password, 'original'

  run 'whoami'
  with_user("someone_else", 'password') do
    run "whoami" 
  end
  run 'whoami'
end

The output of the previous task (stripped down) is:


 ** [out :: 127.0.0.1] original
 ** [out :: 127.0.0.1] someone_else
 ** [out :: 127.0.0.1] original

The implementation of with_user is:


def with_user(new_user, new_pass, &block)
  old_user, old_pass = user, password
  set :user, new_user
  set :password, new_pass
  close_sessions
  yield
  set :user, old_user
  set :password, old_pass
  close_sessions
end

def close_sessions
  sessions.values.each { |session| session.close }
  sessions.clear
end

It saves the old user and password and then sets the new values. Then, it disconnects from all of the servers, which forces a reconnect on the next command with the new user and password. When the block is complete, it resets the user and password and disconnects again.

The repeated reconnecting is not the most efficient solution, but it is simple and works for us.

Capistrano with pairing stations

written by paul on July 19th, 2008 @ 02:31 PM

My last few projects have all been developed on mac minis. We have them set up as pairing stations, with two people to a machine. Every machine was cloned from the original (or an image), so each station is just like every other.

Over time, machines need maintenance. For example, one pair realizes that we need to upgrade a gem. In order to keep the machines in sync, that pair would either ssh to every pairing station and perform the upgrade, or else ask each pair to do it themselves.

Instead of this manual process, we came up with a capistrano task to launch a shell on every machine. Now, we can run commands on every pairing station at the same time.


task :pairing_stations do
  ENV["HOSTS"] = "10.1.0.12, 10.1.0.13" 
  shell
end

For example, we can check which version of rake is installed on our pairing stations and make sure that they are all up to date:


% cap pairing_stations
  * executing `pairing_stations'
  * executing `shell'
====================================================================
Welcome to the interactive Capistrano shell! This is an experimental
feature, and is liable to change in future releases. Type 'help' for
a summary of how to use the shell.
--------------------------------------------------------------------
cap> gem list | grep rake
[establishing connection(s) to 10.1.0.12, 10.1.0.13]
 ** [out :: 10.1.0.12] rake (0.7.3)
 ** [out :: 10.1.0.13] rake (0.8.1)
cap> gem update rake

Testing internationalization language files

written by paul on July 11th, 2008 @ 04:53 PM

Update (8/29/08): I wrote about a bunch of improvements to the test described below in Improved internationalization test

My current project wants to localize our site for different languages. One way to do this is to use a plugin like GLoc. We can externalize all of our labels and messages into language files (en.yml, fr.yml, etc), and then switch out the language based upon the user’s preference.

We started a site-wide effort to pull out all of the content into en.yml. However, we had two big questions:
  1. How would we know that we did not miss any content?
  2. How would we ensure going forward that someone would not mistakenly put text in a view (or helper).

We came up with the following solution: We create a new language file called blank.yml which has a value of blank for every key. When we switch to this language, all of the labels are blank. Therefore, we can crawl our site and look for any text that is not blank.

The language files are yaml, and look like:


help: Help
login: Login
username: "Please enter your username:" 
password: "Please enter your password:" 

Our rake task will take this file and generate a new file with the same keys and blank values:


task :'translate:blank' do
  en = YAML.load_file("#{RAILS_ROOT}/lang/en.yml")
  File.open("#{RAILS_ROOT}/lang/blank.yml", "w") do |blank|
    en.each { |key, value| blank.puts("#{key}: ") }
  end
end

Now, we decided to use SpiderTest to crawl our site. SpiderTest will parse the html and follow every link on every page. SpiderTest runs as an integration test, so a simple test will look like:


class InternationalizationTest < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/')
  end
end

Since we want to do more than just test the validity of each link, we need a callback for each page. SpiderTest does not really provide one, but we looked at the source and noticed that it calls a consume_page method for every page. Since we are including SpiderIntegrator as a module in our test class, we can override the method, do what we want, and then call super:


def consume_page(html, url)
  assert_page_has_been_moved_to_language_file(html, url)
  super
end

The assert_page_has_been_moved_to_language_file method uses Hpricot to parse the html and check for text. Many of our pages are dynamic and contain text that cannot be localized. For example, we do not want to localize the address of a building. We decided to add a CSS class that represents text which cannot be localized. For example:


<span class="nonlocalizable"><%= @building.address %></span>

And our assert_page_has_been_moved_to_language_file method looks like:


def assert_page_has_been_moved_to_language_file(page_text, url)
  doc = Hpricot.parse(page_text)
  assert_does_not_contain_words doc.at("title").inner_text, url
  body = doc.at('body')
  (body.search(".nonlocalizable")).remove
  (body.search("//script[@type='text/javascript']")).remove
  assert_does_not_contain_words(body.inner_text, url)
end

def assert_does_not_contain_words text, url
  match = text.match(/\w+/)
  fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end  

We test both the title and the body for text on the page using the \w+ regular expression. We have to strip out the script nodes because the inner_text method will show javascript which is not shown to the user.

Here is the final test, including a setup which switches the language to blank and a teardown that puts it back:


require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    GLoc.set_language :blank
  end

  def teardown
    GLoc.set_language :en
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search(".nonlocalizable")).remove
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
  end

  def assert_does_not_contain_words text, url
    match = text.match(/\w+/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end  

end

A nice side effect of this test is that it will also check for broken links, since SpiderTest will raise an exception if it cannot follow a link. If you do not want SpiderTest to crawl certain pages, you can ignore them by passing the :ignore_urls option to the spider method:


spider(@response.body, '/', :verbose => true, :ignore_urls => [%r{/busted/.*}])

Sendfile does not work on Live CDs

written by paul on June 19th, 2008 @ 02:34 PM

Kent Spillner and I were trying to set up an Ubuntu Live CD with a full rails stack running our current project. We were seeing weird issues with nginx, such as connections being unexpectedly terminated and file uploads not working.

We asked around and Chris Read pointed out that Live CDs use the UnionFS filesystem which does not support sendfile.

From the Versatility and Unix Semantics in Namespace Unification paper:

”...the sendfile system call requires a matching file structure and address space structure. As Unionfs presently has no address space structure, we cannot properly implement sendfile, which is required for loop device mounts and improves performance for Web and NFS servers.”

Once we turned off sendfile (http://wiki.codemongers.com/NginxHttpCoreModule#sendfile), everything worked perfectly.

Options:

Size

Colors