Improved internationalization test

I wrote previously about how we test the internationalization of our website in Testing internationalization language files. Basically, we generate a blank language file with all of the values for all of the labels set to blank. We switch the site to this language, and then we spider the site looking for text.

Over the past couple of months, we have improved our internationalization test and removed some of the existing limitations.

Manually marking nonlocalizable content

One of the limitations of the approach detailed in the previous article is that we had to manually mark content on the page that should not be internationalized by adding a class to the html:

<%= @building.address %>

The basis of our new test is the idea that all text on the page is one of two types:

  1. Labels and static text that live in the language files, which are inserted into the page using the GLoc method l()
  2. Text that the application produces, which should be html escaped using the h() method in the views or helpers

Therefore, if we intercept both of these types of text, we can find anything that is not localized or escaped.

Our new test setup looks like:

def setup
  blank_out_localization
  blank_out_html_escape
end

def blank_out_localization
  GLoc::InstanceMethods.class_eval do
    alias :old_l :l
    def l(symbol, *arguments)
      ""
    end
  end
end

def blank_out_html_escape
  ERB::Util.class_eval do
    alias :old_html_escape :html_escape
    def html_escape(s)
      ""
    end

    alias :h :html_escape
  end
end

We redefine the l() method to return an empty string, so anything that is localized will no longer show up on the page.

The h() or html_escape() methods are used to escape strings for the web (for example, converting ‘<’ into ’<’). We also redefine these methods to return empty strings. Now, all text on the webpage should be blanked out.

We then spider the site as before, which walks every page and checks for non blank text.

It is possible to restore the l() and h() methods in the teardown:

def teardown
  restore_html_escape
  restore_localization
end

def restore_html_escape
  ERB::Util.class_eval do
    alias :html_escape :old_html_escape
  end
end

def restore_localization
  GLoc::InstanceMethods.class_eval do
    alias :l :old_l
  end
end

However, I think it is safer to run this test in its own test suite in a separate ruby process. That way, the l() and h() monkey patching cannot accidentally affect other tests:

namespace :test do
  Rake::TestTask.new(:'internationalization' => ["environment", "load_test_data"]) do |t|
    t.libs << "test"
    t.pattern = "test/acceptance/internationalization_test.rb"
    t.verbose = true
  end

  Rake::TestTask.new(:'acceptance' => ["environment", "load_test_data"]) do |t|
    t.libs << "test"
    t.pattern = FileList["test/acceptance/**/*_test.rb"].exclude("test/acceptance/internationalization_test.rb")
    t.verbose = true
  end
end

Now, we no longer need to mark any content as nonlocalizable. If the test fails, we either forgot to add a label to the language file, or we forgot to escape the text in the page:

<%= l(:name_label) %>

or

<%= h(@building.address) %>

Redirects

We noticed that Rails would send redirects as:

<html><body>You are being <a href="http://www.example.com/some/new/location">redirected</a>.</body></html>

The http://www.example.com URL was tripping up SpiderTest, so we removed that part of each URL. Furthermore, we skip our page checking on redirect pages and assets:

def consume_page(html, url)
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
  super
end

def redirect?(html)
  html.include?("<body>You are being")
end

def asset?(url)
  File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end

Alt and title attributes

We discovered with the original test that we were not testing alt and title attributes on the page. For example, if you hover over a link, it will show the title. We also want these strings internationalized, so we added them to the test with the following code:

assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'

def assert_attribute_does_not_contain_words body, url, attribute
  body.search("//*[@#{attribute}]") do |element|
    assert_does_not_contain_words element.get_attribute(attribute), url
  end
end

Better error messages

We noticed that if you accidentally forget to internationalize a string like “Please enter your username,” the test would fail with a message of “Found text that was not in the language file: Please.” We thought it would be better to show the full string, so we replaced the regex:

/\w+/

with

/[A-Za-z]([A-Za-z]| )*/

The second one matches all word characters or spaces, so it will pick up the entire phrase.

Final result

The final test looks like:

require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    blank_out_localization
    blank_out_html_escape
  end

  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        ""
      end
    end
  end

  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        ""
      end

      alias :h :html_escape
    end
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def redirect?(html)
    html.include?("<body>You are being")
  end

  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end

  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end

  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end

end

These modifications have improved the quality of the internationalization test, and this test has been very useful at catching text that we forget to internationalize.