3 minute read

Update (8/29/08): I wrote about a bunch of improvements to the test described below in Improved internationalization test

My current project wants to localize our site for different languages. One way to do this is to use a plugin like GLoc. We can externalize all of our labels and messages into language files (en.yml, fr.yml, etc), and then switch out the language based upon the user’s preference.

We started a site-wide effort to pull out all of the content into en.yml. However, we had two big questions:

  1. How would we know that we did not miss any content?
  2. How would we ensure going forward that someone would not mistakenly put text in a view (or helper).

We came up with the following solution: We create a new language file called blank.yml which has a value of blank for every key. When we switch to this language, all of the labels are blank. Therefore, we can crawl our site and look for any text that is not blank.

The language files are yaml, and look like:

help: Help
login: Login
username: "Please enter your username:"
password: "Please enter your password:"

Our rake task will take this file and generate a new file with the same keys and blank values:

task :'translate:blank' do
  en = YAML.load_file("#{RAILS_ROOT}/lang/en.yml")
  File.open("#{RAILS_ROOT}/lang/blank.yml", "w") do |blank|
    en.each { |key, value| blank.puts("#{key}: ") }
  end
end

Now, we decided to use SpiderTest to crawl our site. SpiderTest will parse the html and follow every link on every page. SpiderTest runs as an integration test, so a simple test will look like:

class InternationalizationTest < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/')
  end
end

Since we want to do more than just test the validity of each link, we need a callback for each page. SpiderTest does not really provide one, but we looked at the source and noticed that it calls a consume_page method for every page. Since we are including SpiderIntegrator as a module in our test class, we can override the method, do what we want, and then call super:

def consume_page(html, url)
  assert_page_has_been_moved_to_language_file(html, url)
  super
end

The assert_page_has_been_moved_to_language_file method uses Hpricot to parse the html and check for text. Many of our pages are dynamic and contain text that cannot be localized. For example, we do not want to localize the address of a building. We decided to add a CSS class that represents text which cannot be localized. For example:

<span class="nonlocalizable"><%= @building.address %></span>

And our assert_page_has_been_moved_to_language_file method looks like:

def assert_page_has_been_moved_to_language_file(page_text, url)
  doc = Hpricot.parse(page_text)
  assert_does_not_contain_words doc.at("title").inner_text, url
  body = doc.at('body')
  (body.search(".nonlocalizable")).remove
  (body.search("//script[@type='text/javascript']")).remove
  assert_does_not_contain_words(body.inner_text, url)
end

def assert_does_not_contain_words text, url
  match = text.match(/\w+/)
  fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end

We test both the title and the body for text on the page using the \w+ regular expression. We have to strip out the script nodes because the inner_text method will show javascript which is not shown to the user.

Here is the final test, including a setup which switches the language to blank and a teardown that puts it back:

require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest
  include Caboose::SpiderIntegrator

  def setup
    GLoc.set_language :blank
  end

  def teardown
    GLoc.set_language :en
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    assert_page_has_been_moved_to_language_file(html, url)
    super
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search(".nonlocalizable")).remove
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
  end

  def assert_does_not_contain_words text, url
    match = text.match(/\w+/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end

end

A nice side effect of this test is that it will also check for broken links, since SpiderTest will raise an exception if it cannot follow a link. If you do not want SpiderTest to crawl certain pages, you can ignore them by passing the :ignore_urls option to the spider method:

spider(@response.body, '/', :verbose => true, :ignore_urls => [%r{/busted/.*}])

Updated: