Testing page caching with SpiderTest

The website I’m currently working on is similar to an online brochure.
The data on the site changes hourly, but every user sees the same thing.
As a result, we decided to use page caching to dramatically speed up the
site. Once a page is visited, the html is written out to disk and all
subsequent requests are served by apache. The setup of this approach is
detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching
Tutorial
).

Setting up caching was easy, but we wanted to ensure that we did not
make any mistakes. All pages should be cached, since any miss will
result in a much higher load on our rails application. I’ve written
previously about our internationalization test (Improved
internationalization
test
)
which spiders the site (using
SpiderTest) looking for non localized text. Since we were already visiting every
page, it seemed like a good place to add a check for page caching.
Spidering the site again would make our test suite too long.

The consume page method is called for every page that is visited by the
spider. We expanded the implementation by adding a call to
assert_page_is_cached:

def consume_page(html, url)  
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
    assert_page_is_cached(url)
  super
end

def assert_page_is_cached(url)  
  path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
  page = path.ends_with?(".html") ? path : "#{path}.html"
  assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end  

We also had to add new lines to our setup to turn on caching (since it
is normally off in test mode):

def setup  
  FileUtils.rm_rf ActionController::Base.page_cache_directory
  ActionController::Base.perform_caching = true
end  

Since we run this test as its own suite, the test is totally isolated
from other tests. There is no need to implement a teardown.

The full test, including the internationalization testing from before
looks like:

require 'hpricot'

class InternationalizationText < ActionController::IntegrationTest  
  include Caboose::SpiderIntegrator

  def setup
    FileUtils.rm_rf ActionController::Base.page_cache_directory
    ActionController::Base.perform_caching = true
    blank_out_localization
    blank_out_html_escape
  end

  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        ""
      end
    end
  end

  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        ""
      end

      alias :h :html_escape
    end
  end

  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose => true)
  end

  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
      assert_page_is_cached(url)
    super
  end

  def redirect?(html)
    html.include?("<body>You are being")
  end

  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end

  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end

  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end

  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end

  def assert_page_is_cached(url)
    path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
    page = path.ends_with?(".html") ? path : "#{path}.html"
    assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
  end
end