Improved internationalization test

I wrote previously about how we test the internationalization of our
website in Testing internationalization language
files.

Basically, we generate a blank language file with all of the values for
all of the labels set to blank. We switch the site to this language, and
then we spider the site looking for text.

Over the past couple of months, we have improved our
internationalization test and removed some of the existing limitations.

Manually marking nonlocalizable content

One of the limitations of the approach detailed in the previous article
is that we had to manually mark content on the page that should not be
internationalized by adding a class to the html:

<%= @building.address %>

The basis of our new test is the idea that all text on the page is one
of two types:

  1. Labels and static text that live in the language files, which are
    inserted into the page using the
    GLoc method l()
  2. Text that the application produces, which should be html escaped
    using the h() method in the views or helpers

Therefore, if we intercept both of these types of text, we can find
anything that is not localized or escaped.

Our new test setup looks like:

def setup
  blank_out_localization
  blank_out_html_escape
end

def blank_out_localization
  GLoc::InstanceMethods.class_eval do
    alias :old_l :l
    def l(symbol, *arguments)
      ""
    end
  end
end

def blank_out_html_escape
  ERB::Util.class_eval do
    alias :old_html_escape :html_escape
    def html_escape(s)
      ""
    end

    alias :h :html_escape
  end
end

We redefine the l() method to return an empty string, so anything that
is localized will no longer show up on the page.

The h() or html_escape() methods are used to escape strings for the web
(for example, converting ‘<’ into ’<’). We also redefine these
methods to return empty strings. Now, all text on the webpage should be
blanked out.

We then spider the site as before, which walks every page and checks for
non blank text.

It is possible to restore the l() and h() methods in the teardown:

def teardown
  restore_html_escape
  restore_localization
end

def restore_html_escape
  ERB::Util.class_eval do
    alias :html_escape :old_html_escape
  end
end

def restore_localization
  GLoc::InstanceMethods.class_eval do
    alias :l :old_l
  end
end

However, I think it is safer to run this test in its own test suite in a
separate ruby process. That way, the l() and h() monkey patching cannot
accidentally affect other tests:

namespace :test do
  Rake::TestTask.new(:'internationalization' => ["environment", "load_test_data"]) do |t|
    t.libs << "test"
    t.pattern = "test/acceptance/internationalization_test.rb"
    t.verbose = true
  end

  Rake::TestTask.new(:'acceptance' => ["environment", "load_test_data"]) do |t|
    t.libs << "test"
    t.pattern = FileList["test/acceptance/**/*_test.rb"].exclude("test/acceptance/internationalization_test.rb")
    t.verbose = true
  end
end

Now, we no longer need to mark any content as nonlocalizable. If the
test fails, we either forgot to add a label to the language file, or we
forgot to escape the text in the page:

<%= l(:name_label) %>

or

<%= h(@building.address) %>

Redirects

We noticed that Rails would send redirects as:

<html><body>You are being <a href="http://www.example.com/some/new/location">redirected</a>.</body></html>

The http://www.example.com URL was tripping up
SpiderTest, so we removed that part of each URL. Furthermore, we skip our page checking on
redirect pages and assets:

def consume_page(html, url)
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
  super
end

def redirect?(html)
  html.include?("<body>You are being")
end

def asset?(url)
  File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end

Alt and title attributes

We discovered with the original test that we were not testing alt and
title attributes on the page. For example, if you hover over a link, it
will show the title. We also want these strings internationalized, so we
added them to the test with the following code:

assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'

def assert_attribute_does_not_contain_words body, url, attribute
  body.search("//*[@#{attribute}]") do |element|
    assert_does_not_contain_words element.get_attribute(attribute), url
  end
end

Better error messages

We noticed that if you accidentally forget to internationalize a string
like “Please enter your username,” the test would fail with a message of
“Found text that was not in the language file: Please.” We thought it
would be better to show the full string, so we replaced the regex:

/\w+/

with

/[A-Za-z]([A-Za-z]| )*/

The second one matches all word characters or spaces, so it will pick up
the entire phrase.

Final result

The final test looks like:

require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    blank_out_localization
    blank_out_html_escape
  end

  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        ""
      end
    end
  end

  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        ""
      end

      alias :h :html_escape
    end
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def redirect?(html)
    html.include?("<body>You are being")
  end

  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end

  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end

  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end

end

These modifications have improved the quality of the
internationalization test, and this test has been very useful at
catching text that we forget to internationalize.

Paul Gross

Paul Gross

I'm a lead software developer in Seattle working for Braintree Payments.

Read More