select_or_label with custom form builder
In our web app, we have a common UI pattern: replacing select lists (eg. drop downs) with a label if there is only one item in the list. For example, when creating a subscription, the user must choose a plan. Normally, there is a list of plans to choose from. However, if there is only one plan that the user can choose, we show a label specifying the one plan they’re getting instead of showing them a list of one.
We implemented this with a custom form builder and a method called select_or_label. It takes the same arguments as select, but only creates the select list if the list of choices has more than one element.
The view looks like:
<% form_for :subscription, :builder => SelectFormBuilder, :url => { :controller => :subscriptions, :action => :create } do |form| -%>
<%= form.label :plan %>
<%= form.select_or_label :plan_id, Plan.all.collect {|p| [p.name, p.id]} %>
<% end -%>
The form builder creates a span and a hidden_field if the size of the choices is 1:
class SelectFormBuilder < ActionView::Helpers::FormBuilder
include ActionView::Helpers::TagHelper
def select_or_label(method, choices, options = {})
if choices.size == 1
content_tag(:span, choices.first.first) +
hidden_field(method, :value => choices.first.last)
else
select(method, choices, options)
end
end
end
And the spec:
require File.dirname(__FILE__) + '/../spec_helper'
describe SelectFormBuilder, :type => :helper do
describe "select_or_label" do
before do
helper = Class.new { include ActionView::Helpers }.new
@builder = SelectFormBuilder.new(:subscription, Subscription.new, helper, {}, nil)
end
it "returns a span and hidden field if the size of the choices array is only one" do
html = @builder.select_or_label(:plan_id, [["name", "id"]])
html.should_not have_tag("select")
html.should have_tag("span", "name")
html.should have_tag("input[type=hidden][name=?][value=?]", "subscription[plan_id]", "id")
end
it "returns a select if the size of the choices array is greater than one" do
html = @builder.select_or_label(:plan_id, [["name", "id"], ["other name", "other_id"]])
html.should_not have_tag("input")
html.should have_tag("select[name=?]", "subscription[plan_id]") do
with_tag("option[value=?]", "id", "name")
with_tag("option[value=?]", "other_id", "other name")
end
end
end
end
Node.js, redis, and resque
Update (3/2/10): Updated code to work with version 0.1.30 of node.js
I’ve continued to play with node.js, and I’ve decided to do a follow up spike to my previous one: Web proxy in node.js for high availability
The previous spike used node to proxy requests directly to a web server. This spike uses node to put messages into a (redis) queue. Ruby background workers read from the queue, process the requests, and respond on a different queue. When node receives the response from the background worker, it sends the response back to the waiting user.
Just like my first spike, this type of architecture can be used for high availability web sites. Since all messages go into a queue and node holds the connections from the users, the site can be upgraded (including database migrations or infrastructure changes) as long as node and redis stay the same. Once the upgrade is finished, the workers can resume working from the queue. Users would see an extra long request, but as long as the upgrade was short (eg, less than a minute), the user should not know the site was down.
A queue has a lot of advantages over a straight proxy:- Easy to scale up and down by adding or removing workers
- Can use priority queues to prioritize more important web requests
- Easy to monitor (eg, how many messages are in the queue, how fast are they being added)
Here is a very simple version of the code. First, the node webserver (using redis-node-client):
var sys = require('sys'),
http = require('http'),
redis = require("./redisclient");
var queuedRes = {}
var counter = 1;
http.createServer(function (req, res) {
pushOnQueue(req, res);
}).listen(8000);
function pushOnQueue(req, res) {
requestNumber = counter++;
message = JSON.stringify({
"class": "RequestProcessor",
"args": [ {"node_id": requestNumber, "url": req.url} ]
});
client.rpush('resque:queue:requests', message);
queuedRes[requestNumber] = res
}
var client = new redis.Client();
client.connect(function() {
popFromQueue();
});
function popFromQueue() {
client.lpop('responses', handleResponse);
}
function handleResponse(err, result) {
if (result == null) {
setTimeout(function() { popFromQueue(); }, 100);
} else {
json = JSON.parse(result);
requestNumber = json.node_id;
body = unescape(json.body);
res = queuedRes[requestNumber];
res.writeHeader(200, {'Content-Type': 'text/plain'});
res.write(body);
res.close();
delete queuedRes[requestNumber];
popFromQueue();
}
}
sys.puts('Server running at http://127.0.0.1:8000/');
Also available as a gist.
pushOnQueue() is called on incoming web requests. This creates a JSON message and pushes it on the resque:queue:requests queue. It also puts the res object into a hash so it can be retrieved again on the way back.
At the same time, a queue listener is set up using redis.Client(). On connect, popFromQueue() is called. This method pops messages from the responses queue and calls handleResponse(). If the pop did not find a message, it is scheduled to call again in 100 milliseconds. If it did find a message, the message is parsed with JSON, the requestNumber is pulled out, and the original res object is pulled out of the queuedRes hash. The res object is then sent the body of the message from the queue, which makes it back to the user.
On the other side, I have a ruby worker using resque:
class RequestProcessor
@queue = :requests
APP = Rack::Builder.new do
use Rails::Rack::Static
use Rack::CommonLogger
run ActionController::Dispatcher.new
end
RACK_BASE_REQUEST = {
"PATH_INFO" => "/things",
"QUERY_STRING" => "",
"REQUEST_METHOD" => "GET",
"SERVER_NAME" => "localhost",
"SERVER_PORT" => "3000",
"rack.errors" => STDERR,
"rack.input" => StringIO.new(""),
"rack.multiprocess" => true,
"rack.multithread" => false,
"rack.run_once" => false,
"rack.url_scheme" => "http",
"rack.version" => [1, 0],
}
def self.perform(hash)
url = hash.delete("url")
request = RACK_BASE_REQUEST.clone
request["PATH_INFO"] = url
response = APP.call(request)
body = ""
response.last.each { |part| body << part }
hash["body"] = URI.escape(body)
cmd = "redis-cli rpush responses #{hash.to_json.inspect}"
system cmd
end
end
Also available as a gist.
The worker can be started with:
env QUEUE=requests INTERVAL=1 rake environment resque:work
This worker uses resque, which polls the queue and calls perform when a message is received. The perform method builds the Rack request and runs the URL from the message through Rails. It then pushes the response body onto the responses queue using redis-cli.
As before, this spike only works with GET requests and does not pass any headers through to keep the code simple. Comments and forks are welcome.
Web proxy in node.js for high availability
Update (3/8/10): Updated code to work with version 0.1.30 of node.js
I’ve been thinking about high availability websites lately. In particular, I want sites that can be upgraded (including database migrations or even infrastructure changes) without downtime.
I’ve also been playing with node.js lately, and I decided to spike out a web proxy that would sit between users and the actual website (eg, a rails app). When performing upgrades, the proxy would hold users connections and wait. Once the upgrade was done, the proxy would forward requests as usual. Users would see an extra long request, but as long as the upgrade was short (eg, less than a minute), the user should not know the site was down.
This type of proxy server seems like a good fit with node. Node’s event model means that there will be very little overhead when holding connections. There are no threads stacking up and waiting. Since everything is non-blocking, this server should scale well.
Here is a very simple version of the code:
var fs = require('fs'),
sys = require('sys'),
http = require('http');
http.createServer(function (req, res) {
checkBalanceFile(req, res);
}).listen(8000);
function checkBalanceFile(req, res) {
fs.stat("balance", function(err) {
if (err) {
setTimeout(function() {checkBalanceFile(req, res)}, 1000);
} else {
passThroughOriginalRequest(req, res);
}
});
}
function passThroughOriginalRequest(req, res) {
var request = http.createClient(2000, "localhost").request("GET", req.url, {});
request.addListener("response", function (response) {
res.writeHeader(response.statusCode, response.headers);
response.addListener("data", function (chunk) {
res.write(chunk);
});
response.addListener("end", function () {
res.close();
});
});
request.close();
}
sys.puts('Server running at http://127.0.0.1:8000/');
Here is a gist if anyone would like to fork.
Basically, I use http.createServer to create a web server on port 8000. On incoming requests, I call checkBalanceFile. This method will try to stat a local file called balance. If it finds it, it will call passThroughOriginalRequest, which forwards the request to another web server on port 2000. If the balance file does not exist, I use setTimeout to call checkBalanceFile again in one second.
With a proxy server like this, the main application can be upgraded by removing the balance file. While the file is missing, the node web server will hold all of the connections and check every second for the reappearance of the balance file. Once it comes back, all requests will be forwarded along and then streamed back to the user.
Currently, this spike only works with GET requests and does not pass any headers through, since I wanted to keep the code simple.
rake_commit_tasks now supports git
The rake_commit_tasks plugin now has preliminary support for git. rake_commit_tasks is a rails plugin which contains a set of rake tasks for checking your project into source control (git or subversion).
The workflow for committing and pushing with git is slightly different from subversion. The current steps of “rake commit” with git are roughly:
- Resets soft back to origin/branch_name (git reset—soft origin/branch_name)
- Adds new files to git and removes deleted files (git add -A .)
- Prompts for a commit message
- Commits to git (git commit -m ’...’)
- Pulls changes from origin and does a rebase to keep a linear history (git pull—rebase)
- Runs the default rake task (rake default)
- Checks cruisecontrol.rb to see if the build is passing
- Pushes the commit to origin (git push origin branch_name)
The “git reset—soft” in #1 is used to collapse unpushed commits. Each time “rake commit” is run, any commits that have not been pushed are undone and the changes are put into the index. Then, the “git add -A .” adds the new changes. Now, the “git commit” command will create one commit with all of the unpushed changes.
This collapsing comes in handy when “rake commit” fails (for example, a broken test). Once the test is fixed, the fix should go into the same commit as the original work. Without the “git reset” command, there will be two commits (the original, and the one with the fix).
The “—rebase” flag is used in #5 when running “git pull” to keep a linear history without merge commits. If someone else has committed and pushed, a normal “git pull” will create a merge commit merging the other person’s work with your own. The “git pull—rebase” undoes the local commit, does a “git pull” and then replays the local commit on top. Merge commits are useful when there are multiple streams of work, such as a release branch. However, when everyone is working in master, they merely clutter the history.
Comments and patches are welcome.
RailsConf Presentation
This post is late, but the slides from our RailsConf presentation are online:
Rails in the Large:How We’re Developing the Largest Rails Project in the World
Useful unix tricks - part 3
Here is part 3 of Useful unix tricks and Useful unix tricks – part 2.
!! is the previous command in the shell history
It is pretty common to want to rerun the previous command, possibly with something new on the beginning or end. !! is that command in the history. For example:
% tail foo
tail: cannot open `foo' for reading: Permission denied
% sudo !!
sudo tail foo
hello world
As you can see, I forgot to sudo the first command. Now, I want to rerun it with a sudo at the front, so I can just do “sudo !!” and press enter. The shell will print out the command it is running, followed by whatever it would print normally.
Tail multiple files at once
The tail command can take multiple files, and it will show the output of each one. You can combine this with the -f flag, and tail will intersperse the output of each file in real time. This is incredibly handy for looking at log files. For example, we can tail both the apache and rails logs to see the requests:
==> log/production.log <==
Processing MephistoController#dispatch (for 127.0.0.1 at 2009-02-20 13:33:31) [GET]
Parameters: {"action"=>"dispatch", "path"=>["2008", "7", "19", "capistrano-with-pairing-stations"], "controller"=>"mephisto"}
Completed in 784ms (View: 0, DB: 260) | 200 OK [http://www.pgrs.net/2008/7/19/capistrano-with-pairing-stations]
==> /var/log/apache2/access.log <==
127.0.0.1 - - [20/Feb/2009:13:33:31 -0600] "GET /2008/7/19/capistrano-with-pairing-stations HTTP/1.1" 200 16049 "http://www.pgrs.net/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6 Ubiquity/0.1.5"
As you can see, tail prints ==> <== to show which file the output is for.
Use vim -b to show nonprintable characters
Sometimes a file will have nonprintable characters, such as windows line breaks. Most editors won’t show them, but you can use “vim -b” to see and edit them. The -b flag tells vim to use binary mode. For example, here is a file with windows line endings:
% cat foo.txt
Hello
World
% vim -b foo.txt
Hello^M
World^M
^M
As you can see, the vim binary mode can see the line endings whereas cat cannot.
** is a recursive wildcard in zsh
You can use the recursive wildcard ** in zsh to do complex matching. For example, let’s say that you want to search all ruby files in the current project for the string RAILS_ENV. Normally, you would do something like:
% find . -name '*.rb' | xargs grep RAILS_ENV
./config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
./test/test_helper.rb:ENV["RAILS_ENV"] = "test"
In zsh, you can accomplish the same with a much simpler command:
% grep RAILS_ENV **/*.rb
config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
test/test_helper.rb:ENV["RAILS_ENV"] = "test"
The wildcard **/*.rb recursively matches any files that end in .rb, so there is no need for a find command.
If there are a lot of files, you will occasionally get the error:
% grep RAILS_ENV **/*.rb
zsh: argument list too long: grep
This means that the **/*.rb match returned too many arguments to handle. In this case, you can use echo and xargs to get the job done, which is still simpler than the find command:
% echo **/*.rb | xargs grep RAILS_ENV
find -X will show bad filenames
It is pretty command to run find and then pass the arguments into xargs. However, if any filenames contain spaces or quotes, xargs will fail. You can use find -X to find any paths that will fail. find will warn on these paths and then skip them:
% find -X .
.
find: ./filename with spaces: illegal path
If you want to use xargs with these files, use the -print0 option to tell find to use a NUL character instead of a space, and xargs -0 to tell xargs to parse on NUL instead of space:
% find . -print0 | xargs -0 echo
. ./filename with spaces
cd – will return to the previous folder
Passing – into the cd command will return you to the last folder you were in:
/tmp% pwd
/tmp
/tmp% cd ~
~% cd -
/tmp
/tmp%
Use ctrl+z and kill %1 to kill a process that will not die
Sometimes, you run a command and pressing ctrl+c will not kill it. When that happened, I use to open up another terminal window to kill -9 the process until someone showed me the following trick:
% sleep 1000
^Z
zsh: suspended sleep 1000
% kill -9 %1
%
[1] + killed sleep 1000
Pressing ctrl+z suspends the process and returns you to a terminal prompt. Then, kill -9 %1 sends the kill -9 signal to job #1, which is our suspended process.
pwdx shows the working directory of a process
It can be really useful to see the working directory of a running process. For example, you can see which release a ruby process is running:
% sudo pwdx 23961
23961: /var/www/myapp/releases/20081231200733
Unfortunately, I haven’t found pwdx for the mac. If anyone knows how I can install it, please let me know.
Use sh -x to debug shell scripts
If you want to know what commands a shell script runs, run it with the -x flag. For example, say we have a shell script with two echos. Compare the output with and without the -x flag:
% sh foo.sh
hello
world
% sh -x foo.sh
+ echo hello
hello
+ echo world
world
% zsh -x foo.sh
+foo.sh:1> echo hello
hello
+foo.sh:2> echo world
world
As you can see, the -x flag shows which command is being run. zsh takes it a step farther and shows the script and line number as well.
sysctl replaces /proc on macs
The /proc filesystem is a great way to find out about a linux machine. For example, you can “cat /proc/cpuinfo” to find out how many processors are on the box. However, macs don’t have /proc. You can use sysctl instead. The -a flag prints out all keys and values:
% sysctl -a
kern.ostype = Darwin
kern.osrelease = 9.6.0
kern.osrevision = 199506
kern.version = Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386
...
You can also just get a single value with the -n flag. For example, this command will print out the number of cpu cores:
% sysctl -n hw.ncpu
2
Automerging now in rake_commit_tasks
The rake_commit_tasks plugin now supports automatically merging changes from branch to trunk. I describe the feature and the use case at Automatically merge changes from branch to trunk, although the merging code now uses “svn merge” instead of “svn diff” in order to keep svn mergeinfo.
Basically, if you branch to release code and then fix a bug on the branch, the change will automatically be merged over to the trunk when you run a “rake commit.” Just set PATH_TO_TRUNK_WORKING_COPY to the location of the trunk checkout in your Rakefile.
If you are curious, you can check out the commit at github.
Flight delays application overhaul
In Flight delay information for United flights, I talked about an application I wrote to show United flight delays over time. I have now completely rewritten that application to allow comparison of multiple flights on one graph.
People that travel for a living know that early morning flights tend to be less delayed than evening flights. In the morning, the planes are usually already at the airport, so there is no chance of an incoming flight delay. There are no lines of planes waiting to take off yet, so the time between leaving the gate and getting into the air tends to be a lot less.
The difference in these times can be dramatic. Here is a report comparing an early morning flight with an evening flight from Newark to Chicago (two heavily delayed airports): Flight Delays
The report shows a table of min, max, and median delays:
| Flight | Min Delay | Median Delay | Max Delay |
|---|---|---|---|
| Flight 655 (EWR -> ORD) on Thursday at 07:38 PM | -14 | 40 | 239 |
| Flight 635 (EWR -> ORD) on Thursday at 05:58 AM | -26 | 0 | 157 |
And here are the two graphs shown in the report above:

As you can see from the first graph, day by day, flight 655 (departing around 7:38 PM) is almost always more delayed than flight 635 (departing around 5:58 AM).
The second graph shows a histogram. You can see that flight 635 is clustered more heavily to the left (-40 to 20) which shows that it is generally between 40 minutes early and 20 minutes late. Flight 655 is much more spread out to the right, which shows that it has far more delays. On one day, it was over 220 minutes late!
Strange behavior with define_method and the wrong number of arguments
I noticed the other day that methods defined using define_method have very strange behavior when given the wrong number of arguments. For example, here is a class with a bunch of methods defined using define_method:
class Foo
define_method :no_args do
p "no args"
end
define_method :one_arg do |one|
p one
end
define_method :two_args do |one, two|
p one
p two
end
end
Now, if we call no_args with an argument, it will silently ignore the argument:
>> Foo.new.no_args(1)
"no args"
=> nil
However, if we have a method that expects one argument but receives either none or more than one, we get a warning:
>> Foo.new.one_arg
./foo.rb:6: warning: multiple values for a block parameter (0 for 1)
from (irb):3
nil
=> nil
>> Foo.new.one_arg(1,2,3)
./foo.rb:6: warning: multiple values for a block parameter (3 for 1)
from (irb):2
[1, 2, 3]
=> nil
In the second case, it took all three arguments and passed them as an array into the method expecting one argument.
It gets even stranger with a method that expects two arguments. Now, we actually get errors:
>> Foo.new.two_args
ArgumentError: wrong number of arguments (0 for 2)
from (irb):2:in 'two_args'
from (irb):2
>> Foo.new.two_args(1,2,3)
ArgumentError: wrong number of arguments (3 for 2)
from ./foo.rb:10:in 'two_args'
from (irb):3
I’m not sure why a one argument method gives a warning while a two argument method gives an error. Clearly, define_method is very different from using def.
Mephisto with Phusion Passenger
I recently upgraded my blogging software, Mephisto, from 0.7.3 to 0.8.1. One thing I noticed is that they moved the cached files from public to a cache subfolder containing the site. For example, on a new installation, the cached index page is in public/cache/unusedfornow.com/index.html.
Mephisto writes a cached page for every page visited. This means that any subsequent requests for this page can be served directly by apache from the cached file rather than going through the whole rails stack (all the way down to the database). This is much faster and uses less memory.
I run my blog in Apache with Phusion Passenger. The problem with this new cache location is that Passenger only looks in public for cached files. This means that the cached pages are ignored and every request is being served by Rails. After searching google and working some mod_rewrite magic, I came up with the following solution. Here is the Apache virtual host configuration for my blog:
<VirtualHost *:80>
ServerName pgrs.net
ServerAlias www.pgrs.net
DocumentRoot /var/www/mephisto-0.8.1/public
RailsAllowModRewrite on
RewriteEngine On
# Rewrite / to index.html
RewriteRule ^/$ /index.html [QSA]
# Rewrite /some_page to /some_page.html
RewriteRule ^([^.]+?)/?$ $1.html [QSA]
# If cached file exists, serve it and stop processing
RewriteCond %{DOCUMENT_ROOT}/cache/unusedfornow.com%{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ /cache/unusedfornow.com$1 [L]
ErrorLog /var/log/apache2/pgrs-error.log
CustomLog /var/log/apache2/pgrs-access.log combined
</VirtualHost>
The first 3 lines are standard Phusion Passenger configuration: Deploying a Ruby on Rails application. Then, I turn on mod_rewrite. The first two sets of mod_rewrite configuration cascade and turn the request into what the filename will look like. So / becomes /index.html, and /2008/10/29/deploying-trunk-or-tags-with-capistrano becomes /2008/10/29/deploying-trunk-or-tags-with-capistrano.html.
The final set checks if this file exists under /var/www/mephisto-0.8.1/public/cache/unusedfornow.com (the -f flag), and if it does, tells apache to serve this file. The [L] tells mod_rewrite that this is the last rule, so it should stop processing now. If the file does not exist, the request falls through mod_rewrite and Passenger picks it up and serves it through Rails.
I verified that this works by looking at the response headers in Firefox (Tools -> Page Info -> Headers) of any given blog page. The first time, there is a “X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.3” header. Once I refresh, the X-Powered-By header is gone since the request never makes it to Passenger. Apache is once again doing the hard work, and Rails is only used when the request is new or dynamic (such as searching).
Deploying trunk or tags with capistrano
On my current project, we use capistrano for all of our deployments. In the simplest case, you tell capistrano the URL of your repository, and then you deploy by performing a checkout from this repository:
set :repository, "http://www.example.com/svn/myproject/trunk"
However, putting this line in the capistrano recipe only lets you deploy from trunk. We needed the ability to deploy either the trunk or a tag of our choice. We generally deploy the trunk to development servers and the latest tag to staging and production servers.
We started out with something more complicated, but with the help of Jamis Buck on the capistrano mailing list, we came up with the following solution:
set :repository_root, "http://www.example.com/svn/myproject"
set(:tag) { Capistrano::CLI.ui.ask("Tag to deploy (or type 'trunk' to deploy from trunk): ") }
set(:repository) { (tag == "trunk") ? "#{repository_root}/trunk" : "#{repository_root}/tags/#{tag}" }
This deploy script will prompt the user to enter either a tag name or the word trunk. It will then use that variable to set the repository to the correct path. The output of a deploy will look like:
% cap deploy * executing `deploy' ... * executing `deploy:update' ** transaction: start * executing `deploy:update_code' Tag to deploy (or type 'trunk' to deploy from trunk): trunk * executing "svn checkout -q -r2210 http://www.example.com/svn/myproject/trunk /var/www/myproject/releases/20081029012754 && (echo 2210 > /var/www/myproject/releases/20081029012754/REVISION)" ...
Capistrano evaluates variables lazily. It will only fetch the repository variable if it needs it, which will then fetch the tag variable, which will then prompt the user. Therefore, if you run a command that does not require the repository, it will not prompt. For example, running the following command will not prompt the user:
cap deploy:restart
Next, we created a convenience rake task to deploy the trunk without prompting:
namespace :deploy do
task :trunk do
sh "cap -s tag=trunk deploy"
end
end
This rake task sets the tag variable on the command line. Therefore, capistrano will not need to evaluate the set(:tag) command and will deploy the trunk without prompting.
Finding nonprintable characters with a test
Our current application includes a lot of static content created by content editors. They check in static HTML files, and we include these files in various parts of the application. The problem is that they sometimes copy and paste from applications such as Outlook or Word, which can introduce unprintable characters into the application. These characters show up strangely on the website.
After this happened a couple of times, we decided to write a test to ensure that we would always catch the unprintable characters:
class NonPrintableCharactersTest < Test::Unit::TestCase
def test_for_non_printable_characters_in_content
assert_equal "", `find #{RAILS_ROOT}/content -name '*.html' | xargs grep -n '[^[:space:][:print:]]'`
end
end
We use find to get a list of all of the html files in the content folder. Then, we pipe this to grep, using the regular expression
'[^[:space:][:print:]]'which matches anything except spaces or printable characters. The output of this test looks like:
Loaded suite test/non_printable_characters_test
Started
F
Finished in 0.86005 seconds.
1) Failure:
test_for_non_printable_characters_in_content(NonPrintableCharactersTest) [test/non_printable_characters_test.rb:5]:
<""> expected but was
<"/some/path/to/content/tmp.html:48:character �</span></p>\n">.
1 tests, 1 assertions, 1 failures, 0 errors
The failure message shows the file and line with the character, so it is easy to fix.
Testing page caching with SpiderTest
The website I’m currently working on is similar to an online brochure. The data on the site changes hourly, but every user sees the same thing. As a result, we decided to use page caching to dramatically speed up the site. Once a page is visited, the html is written out to disk and all subsequent requests are served by apache. The setup of this approach is detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching Tutorial).
Setting up caching was easy, but we wanted to ensure that we did not make any mistakes. All pages should be cached, since any miss will result in a much higher load on our rails application. I’ve written previously about our internationalization test (Improved internationalization test) which spiders the site (using SpiderTest) looking for non localized text. Since we were already visiting every page, it seemed like a good place to add a check for page caching. Spidering the site again would make our test suite too long.
The consume page method is called for every page that is visited by the spider. We expanded the implementation by adding a call to assert_page_is_cached:
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
assert_page_is_cached(url)
super
end
def assert_page_is_cached(url)
path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
page = path.ends_with?(".html") ? path : "#{path}.html"
assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end
We also had to add new lines to our setup to turn on caching (since it is normally off in test mode):
def setup
FileUtils.rm_rf ActionController::Base.page_cache_directory
ActionController::Base.perform_caching = true
end
Since we run this test as its own suite, the test is totally isolated from other tests. There is no need to implement a teardown.
The full test, including the internationalization testing from before looks like:
require 'hpricot'
class InternationalizationText < ActionController::IntegrationTest
include Caboose::SpiderIntegrator
def setup
FileUtils.rm_rf ActionController::Base.page_cache_directory
ActionController::Base.perform_caching = true
blank_out_localization
blank_out_html_escape
end
def blank_out_localization
GLoc::InstanceMethods.class_eval do
alias :old_l :l
def l(symbol, *arguments)
""
end
end
end
def blank_out_html_escape
ERB::Util.class_eval do
alias :old_html_escape :html_escape
def html_escape(s)
""
end
alias :h :html_escape
end
end
def test_all_text_has_been_moved_to_language_file
get '/'
assert_response :success
spider(@response.body, '/', :verbose => true)
end
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
assert_page_is_cached(url)
super
end
def redirect?(html)
html.include?("<body>You are being")
end
def asset?(url)
File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end
def assert_page_has_been_moved_to_language_file(page_text, url)
doc = Hpricot.parse(page_text)
assert_does_not_contain_words doc.at("title").inner_text, url
body = doc.at('body')
(body.search("//script[@type='text/javascript']")).remove
assert_does_not_contain_words(body.inner_text, url)
assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'
end
def assert_attribute_does_not_contain_words body, url, attribute
body.search("//*[@#{attribute}]") do |element|
assert_does_not_contain_words element.get_attribute(attribute), url
end
end
def assert_does_not_contain_words text, url
match = text.match(/[A-Za-z]([A-Za-z]| )*/)
fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end
def assert_page_is_cached(url)
path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
page = path.ends_with?(".html") ? path : "#{path}.html"
assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end
end
Capistrano dry run
I submitted a patch to Capistrano to add a “—dry-run” option (or -n for short). This flag causes capistrano to print out all of commands it will run without actually running them. It is an easy way to see what the cap task will do to your servers before you run it.
My patch was accepted and released as part of Capistrano 2.5.0. You can read more about the new features at:
http://capify.org/2008/8/29/capistrano-2-5-0
and see the details of my commit at github:
http://github.com/capistrano/capistrano/commit/7279a3858e2bcebe84735223d5f8b4397c4ad85b
Improved internationalization test
I wrote previously about how we test the internationalization of our website in Testing internationalization language files. Basically, we generate a blank language file with all of the values for all of the labels set to blank. We switch the site to this language, and then we spider the site looking for text.
Over the past couple of months, we have improved our internationalization test and removed some of the existing limitations.
Manually marking nonlocalizable content
One of the limitations of the approach detailed in the previous article is that we had to manually mark content on the page that should not be internationalized by adding a class to the html:
<span class="nonlocalizable"><%= @building.address %></span>
The basis of our new test is the idea that all text on the page is one of two types:
- Labels and static text that live in the language files, which are inserted into the page using the GLoc method l()
- Text that the application produces, which should be html escaped using the h() method in the views or helpers
Therefore, if we intercept both of these types of text, we can find anything that is not localized or escaped.
Our new test setup looks like:
def setup
blank_out_localization
blank_out_html_escape
end
def blank_out_localization
GLoc::InstanceMethods.class_eval do
alias :old_l :l
def l(symbol, *arguments)
""
end
end
end
def blank_out_html_escape
ERB::Util.class_eval do
alias :old_html_escape :html_escape
def html_escape(s)
""
end
alias :h :html_escape
end
end
We redefine the l() method to return an empty string, so anything that is localized will no longer show up on the page.
The h() or html_escape() methods are used to escape strings for the web (for example, converting ‘<’ into ’<’). We also redefine these methods to return empty strings. Now, all text on the webpage should be blanked out.
We then spider the site as before, which walks every page and checks for non blank text.
It is possible to restore the l() and h() methods in the teardown:
def teardown
restore_html_escape
restore_localization
end
def restore_html_escape
ERB::Util.class_eval do
alias :html_escape :old_html_escape
end
end
def restore_localization
GLoc::InstanceMethods.class_eval do
alias :l :old_l
end
end
However, I think it is safer to run this test in its own test suite in a separate ruby process. That way, the l() and h() monkey patching cannot accidentally affect other tests:
namespace :test do
Rake::TestTask.new(:'internationalization' => ["environment", "load_test_data"]) do |t|
t.libs << "test"
t.pattern = "test/acceptance/internationalization_test.rb"
t.verbose = true
end
Rake::TestTask.new(:'acceptance' => ["environment", "load_test_data"]) do |t|
t.libs << "test"
t.pattern = FileList["test/acceptance/**/*_test.rb"].exclude("test/acceptance/internationalization_test.rb")
t.verbose = true
end
end
Now, we no longer need to mark any content as nonlocalizable. If the test fails, we either forgot to add a label to the language file, or we forgot to escape the text in the page:
<%= l(:name_label) %>
or
<%= h(@building.address) %>
Redirects
We noticed that Rails would send redirects as:
<html><body>You are being <a href="http://www.example.com/some/new/location">redirected</a>.</body></html>
The http://www.example.com URL was tripping up SpiderTest, so we removed that part of each URL. Furthermore, we skip our page checking on redirect pages and assets:
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
super
end
def redirect?(html)
html.include?("<body>You are being")
end
def asset?(url)
File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end
Alt and title attributes
We discovered with the original test that we were not testing alt and title attributes on the page. For example, if you hover over a link, it will show the title. We also want these strings internationalized, so we added them to the test with the following code:
assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'
def assert_attribute_does_not_contain_words body, url, attribute
body.search("//*[@#{attribute}]") do |element|
assert_does_not_contain_words element.get_attribute(attribute), url
end
end
Better error messages
We noticed that if you accidentally forget to internationalize a string like “Please enter your username,” the test would fail with a message of “Found text that was not in the language file: Please.” We thought it would be better to show the full string, so we replaced the regex:
/\w+/
with
/[A-Za-z]([A-Za-z]| )*/
The second one matches all word characters or spaces, so it will pick up the entire phrase.
Final result
The final test looks like:
require 'hpricot'
class InternationalizationText < ActionController::IntegrationTest
include Caboose::SpiderIntegrator
def setup
blank_out_localization
blank_out_html_escape
end
def blank_out_localization
GLoc::InstanceMethods.class_eval do
alias :old_l :l
def l(symbol, *arguments)
""
end
end
end
def blank_out_html_escape
ERB::Util.class_eval do
alias :old_html_escape :html_escape
def html_escape(s)
""
end
alias :h :html_escape
end
end
def test_all_text_has_been_moved_to_language_file
get '/'
assert_response :success
spider(@response.body, '/', :verbose => true)
end
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
super
end
def redirect?(html)
html.include?("<body>You are being")
end
def asset?(url)
File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end
def assert_page_has_been_moved_to_language_file(page_text, url)
doc = Hpricot.parse(page_text)
assert_does_not_contain_words doc.at("title").inner_text, url
body = doc.at('body')
(body.search("//script[@type='text/javascript']")).remove
assert_does_not_contain_words(body.inner_text, url)
assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'
end
def assert_attribute_does_not_contain_words body, url, attribute
body.search("//*[@#{attribute}]") do |element|
assert_does_not_contain_words element.get_attribute(attribute), url
end
end
def assert_does_not_contain_words text, url
match = text.match(/[A-Za-z]([A-Za-z]| )*/)
fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end
end
These modifications have improved the quality of the internationalization test, and this test has been very useful at catching text that we forget to internationalize.