Testing internationalization language files

Update (8/29/08): I wrote about a bunch of improvements to the test
described below in Improved internationalization
test

My current project wants to localize our site for different languages.
One way to do this is to use a plugin like
GLoc. We can externalize
all of our labels and messages into language files (en.yml, fr.yml,
etc), and then switch out the language based upon the user’s preference.

We started a site-wide effort to pull out all of the content into
en.yml. However, we had two big questions:

  1. How would we know that we did not miss any content?
  2. How would we ensure going forward that someone would not mistakenly
    put text in a view (or helper).

We came up with the following solution: We create a new language file
called blank.yml which has a value of blank for every key. When we
switch to this language, all of the labels are blank. Therefore, we can
crawl our site and look for any text that is not blank.

The language files are yaml, and look like:

help: Help
login: Login
username: "Please enter your username:"
password: "Please enter your password:"

Our rake task will take this file and generate a new file with the same
keys and blank values:

task :'translate:blank' do
  en = YAML.load_file("#{RAILS_ROOT}/lang/en.yml")
  File.open("#{RAILS_ROOT}/lang/blank.yml", "w") do |blank|
    en.each { |key, value| blank.puts("#{key}: ") }
  end
end

Now, we decided to use
SpiderTest
to crawl our site. SpiderTest will parse the html and follow every link
on every page. SpiderTest runs as an integration test, so a simple test
will look like:

class InternationalizationTest < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/')
  end
end

Since we want to do more than just test the validity of each link, we
need a callback for each page. SpiderTest does not really provide one,
but we looked at the source and noticed that it calls a consume_page
method for every page. Since we are including SpiderIntegrator as a
module in our test class, we can override the method, do what we want,
and then call super:

def consume_page(html, url)
  assert_page_has_been_moved_to_language_file(html, url)
  super
end

The assert_page_has_been_moved_to_language_file method uses
Hpricot to parse the html
and check for text. Many of our pages are dynamic and contain text that
cannot be localized. For example, we do not want to localize the address
of a building. We decided to add a CSS class
that represents text which cannot be localized. For example:

<span class="nonlocalizable"><%= @building.address %></span>

And our assert_page_has_been_moved_to_language_file method looks
like:

def assert_page_has_been_moved_to_language_file(page_text, url)
  doc = Hpricot.parse(page_text)
  assert_does_not_contain_words doc.at("title").inner_text, url
  body = doc.at('body')
  (body.search(".nonlocalizable")).remove
  (body.search("//script[@type='text/javascript']")).remove
  assert_does_not_contain_words(body.inner_text, url)
end

def assert_does_not_contain_words text, url
  match = text.match(/\w+/)
  fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end

We test both the title and the body for text on the page using the \w+
regular expression. We have to strip out the script nodes because the
inner_text method will show javascript which is not shown to the user.

Here is the final test, including a setup which switches the language to
blank and a teardown that puts it back:

require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    GLoc.set_language :blank
  end

  def teardown
    GLoc.set_language :en
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search(".nonlocalizable")).remove
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
  end

  def assert_does_not_contain_words text, url
    match = text.match(/\w+/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end

end

A nice side effect of this test is that it will also check for broken
links, since SpiderTest will raise an exception if it cannot follow a
link. If you do not want SpiderTest to crawl certain pages, you can
ignore them by passing the :ignore_urls option to the spider method:

spider(@response.body, '/', :verbose => true, :ignore_urls => [%r{/busted/.*}])
Paul Gross

Paul Gross

I'm a lead software developer in Seattle working for Braintree Payments.

Read More