This post is late, but the slides from our RailsConf presentation are online:
Rails in the Large:How We’re Developing the Largest Rails Project in the World
This post is late, but the slides from our RailsConf presentation are online:
Rails in the Large:How We’re Developing the Largest Rails Project in the World
Update (8/16/11): Check out Useful unix tricks – part 4
Here is part 3 of Useful unix tricks and Useful unix tricks – part 2.
It is pretty common to want to rerun the previous command, possibly with something new on the beginning or end. !! is that command in the history. For example:
% tail foo tail: cannot open `foo' for reading: Permission denied % sudo !! sudo tail foo hello world
As you can see, I forgot to sudo the first command. Now, I want to rerun it with a sudo at the front, so I can just do “sudo !!” and press enter. The shell will print out the command it is running, followed by whatever it would print normally.
The tail command can take multiple files, and it will show the output of each one. You can combine this with the -f flag, and tail will intersperse the output of each file in real time. This is incredibly handy for looking at log files. For example, we can tail both the apache and rails logs to see the requests:
==> log/production.log <==
Processing MephistoController#dispatch (for 127.0.0.1 at 2009-02-20 13:33:31) [GET]
Parameters: {"action"=>"dispatch", "path"=>["2008", "7", "19", "capistrano-with-pairing-stations"], "controller"=>"mephisto"}
Completed in 784ms (View: 0, DB: 260) | 200 OK [http://www.pgrs.net/2008/7/19/capistrano-with-pairing-stations]
==> /var/log/apache2/access.log <==
127.0.0.1 - - [20/Feb/2009:13:33:31 -0600] "GET /2008/7/19/capistrano-with-pairing-stations HTTP/1.1" 200 16049 "http://www.pgrs.net/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6 Ubiquity/0.1.5"
As you can see, tail prints ==> <== to show which file the output is for.
Sometimes a file will have nonprintable characters, such as windows line breaks. Most editors won’t show them, but you can use “vim -b” to see and edit them. The -b flag tells vim to use binary mode. For example, here is a file with windows line endings:
% cat foo.txt Hello World % vim -b foo.txt Hello^M World^M ^M
As you can see, the vim binary mode can see the line endings whereas cat cannot.
You can use the recursive wildcard ** in zsh to do complex matching. For example, let’s say that you want to search all ruby files in the current project for the string RAILS_ENV. Normally, you would do something like:
% find . -name '*.rb' | xargs grep RAILS_ENV ./config/environment.rb:# ENV['RAILS_ENV'] ||= 'production' ./test/test_helper.rb:ENV["RAILS_ENV"] = "test"
In zsh, you can accomplish the same with a much simpler command:
% grep RAILS_ENV **/*.rb config/environment.rb:# ENV['RAILS_ENV'] ||= 'production' test/test_helper.rb:ENV["RAILS_ENV"] = "test"
The wildcard **/*.rb recursively matches any files that end in .rb, so there is no need for a find command.
If there are a lot of files, you will occasionally get the error:
% grep RAILS_ENV **/*.rb zsh: argument list too long: grep
This means that the **/*.rb match returned too many arguments to handle. In this case, you can use echo and xargs to get the job done, which is still simpler than the find command:
% echo **/*.rb | xargs grep RAILS_ENV
It is pretty command to run find and then pass the arguments into xargs. However, if any filenames contain spaces or quotes, xargs will fail. You can use find -X to find any paths that will fail. find will warn on these paths and then skip them:
% find -X . . find: ./filename with spaces: illegal path
If you want to use xargs with these files, use the -print0 option to tell find to use a NUL character instead of a space, and xargs -0 to tell xargs to parse on NUL instead of space:
% find . -print0 | xargs -0 echo . ./filename with spaces
Passing – into the cd command will return you to the last folder you were in:
/tmp% pwd /tmp /tmp% cd ~ ~% cd - /tmp /tmp%
Sometimes, you run a command and pressing ctrl+c will not kill it. When that happened, I use to open up another terminal window to kill -9 the process until someone showed me the following trick:
% sleep 1000 ^Z zsh: suspended sleep 1000 % kill -9 %1 % [1] + killed sleep 1000
Pressing ctrl+z suspends the process and returns you to a terminal prompt. Then, kill -9 %1 sends the kill -9 signal to job #1, which is our suspended process.
It can be really useful to see the working directory of a running process. For example, you can see which release a ruby process is running:
% sudo pwdx 23961 23961: /var/www/myapp/releases/20081231200733
Unfortunately, I haven’t found pwdx for the mac. If anyone knows how I can install it, please let me know.
If you want to know what commands a shell script runs, run it with the -x flag. For example, say we have a shell script with two echos. Compare the output with and without the -x flag:
% sh foo.sh hello world % sh -x foo.sh + echo hello hello + echo world world % zsh -x foo.sh +foo.sh:1> echo hello hello +foo.sh:2> echo world world
As you can see, the -x flag shows which command is being run. zsh takes it a step farther and shows the script and line number as well.
The /proc filesystem is a great way to find out about a linux machine. For example, you can “cat /proc/cpuinfo” to find out how many processors are on the box. However, macs don’t have /proc. You can use sysctl instead. The -a flag prints out all keys and values:
% sysctl -a kern.ostype = Darwin kern.osrelease = 9.6.0 kern.osrevision = 199506 kern.version = Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 ...
You can also just get a single value with the -n flag. For example, this command will print out the number of cpu cores:
% sysctl -n hw.ncpu 2
The rake_commit_tasks plugin now supports automatically merging changes from branch to trunk. I describe the feature and the use case at Automatically merge changes from branch to trunk, although the merging code now uses “svn merge” instead of “svn diff” in order to keep svn mergeinfo.
Basically, if you branch to release code and then fix a bug on the branch, the change will automatically be merged over to the trunk when you run a “rake commit.” Just set PATH_TO_TRUNK_WORKING_COPY to the location of the trunk checkout in your Rakefile.
If you are curious, you can check out the commit at github.
In Flight delay information for United flights, I talked about an application I wrote to show United flight delays over time. I have now completely rewritten that application to allow comparison of multiple flights on one graph.
People that travel for a living know that early morning flights tend to be less delayed than evening flights. In the morning, the planes are usually already at the airport, so there is no chance of an incoming flight delay. There are no lines of planes waiting to take off yet, so the time between leaving the gate and getting into the air tends to be a lot less.
The difference in these times can be dramatic. Here is a report comparing an early morning flight with an evening flight from Newark to Chicago (two heavily delayed airports): Flight Delays
The report shows a table of min, max, and median delays:
| Flight | Min Delay | Median Delay | Max Delay |
|---|---|---|---|
| Flight 655 (EWR -> ORD) on Thursday at 07:38 PM | -14 | 40 | 239 |
| Flight 635 (EWR -> ORD) on Thursday at 05:58 AM | -26 | 0 | 157 |
And here are the two graphs shown in the report above:


As you can see from the first graph, day by day, flight 655 (departing around 7:38 PM) is almost always more delayed than flight 635 (departing around 5:58 AM).
The second graph shows a histogram. You can see that flight 635 is clustered more heavily to the left (-40 to 20) which shows that it is generally between 40 minutes early and 20 minutes late. Flight 655 is much more spread out to the right, which shows that it has far more delays. On one day, it was over 220 minutes late!
I noticed the other day that methods defined using define_method have very strange behavior when given the wrong number of arguments. For example, here is a class with a bunch of methods defined using define_method:
class Foo define_method :no_args do p "no args" end define_method :one_arg do |one| p one end define_method :two_args do |one, two| p one p two end end
Now, if we call no_args with an argument, it will silently ignore the argument:
>> Foo.new.no_args(1) "no args" => nil
However, if we have a method that expects one argument but receives either none or more than one, we get a warning:
>> Foo.new.one_arg ./foo.rb:6: warning: multiple values for a block parameter (0 for 1) from (irb):3 nil => nil >> Foo.new.one_arg(1,2,3) ./foo.rb:6: warning: multiple values for a block parameter (3 for 1) from (irb):2 [1, 2, 3] => nil
In the second case, it took all three arguments and passed them as an array into the method expecting one argument.
It gets even stranger with a method that expects two arguments. Now, we actually get errors:
>> Foo.new.two_args ArgumentError: wrong number of arguments (0 for 2) from (irb):2:in 'two_args' from (irb):2 >> Foo.new.two_args(1,2,3) ArgumentError: wrong number of arguments (3 for 2) from ./foo.rb:10:in 'two_args' from (irb):3
I’m not sure why a one argument method gives a warning while a two argument method gives an error. Clearly, define_method is very different from using def.
I recently upgraded my blogging software, Mephisto, from 0.7.3 to 0.8.1. One thing I noticed is that they moved the cached files from public to a cache subfolder containing the site. For example, on a new installation, the cached index page is in public/cache/unusedfornow.com/index.html.
Mephisto writes a cached page for every page visited. This means that any subsequent requests for this page can be served directly by apache from the cached file rather than going through the whole rails stack (all the way down to the database). This is much faster and uses less memory.
I run my blog in Apache with Phusion Passenger. The problem with this new cache location is that Passenger only looks in public for cached files. This means that the cached pages are ignored and every request is being served by Rails. After searching google and working some mod_rewrite magic, I came up with the following solution. Here is the Apache virtual host configuration for my blog:
<VirtualHost *:80> ServerName pgrs.net ServerAlias www.pgrs.net DocumentRoot /var/www/mephisto-0.8.1/public RailsAllowModRewrite on RewriteEngine On # Rewrite / to index.html RewriteRule ^/$ /index.html [QSA] # Rewrite /some_page to /some_page.html RewriteRule ^([^.]+?)/?$ $1.html [QSA] # If cached file exists, serve it and stop processing RewriteCond %{DOCUMENT_ROOT}/cache/unusedfornow.com%{REQUEST_FILENAME} -f RewriteRule ^(.*)$ /cache/unusedfornow.com$1 [L] ErrorLog /var/log/apache2/pgrs-error.log CustomLog /var/log/apache2/pgrs-access.log combined </VirtualHost>
The first 3 lines are standard Phusion Passenger configuration: Deploying a Ruby on Rails application. Then, I turn on mod_rewrite. The first two sets of mod_rewrite configuration cascade and turn the request into what the filename will look like. So / becomes /index.html, and /2008/10/29/deploying-trunk-or-tags-with-capistrano becomes /2008/10/29/deploying-trunk-or-tags-with-capistrano.html.
The final set checks if this file exists under /var/www/mephisto-0.8.1/public/cache/unusedfornow.com (the -f flag), and if it does, tells apache to serve this file. The [L] tells mod_rewrite that this is the last rule, so it should stop processing now. If the file does not exist, the request falls through mod_rewrite and Passenger picks it up and serves it through Rails.
I verified that this works by looking at the response headers in Firefox (Tools -> Page Info -> Headers) of any given blog page. The first time, there is a “X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.3” header. Once I refresh, the X-Powered-By header is gone since the request never makes it to Passenger. Apache is once again doing the hard work, and Rails is only used when the request is new or dynamic (such as searching).
On my current project, we use capistrano for all of our deployments. In the simplest case, you tell capistrano the URL of your repository, and then you deploy by performing a checkout from this repository:
set :repository, "http://www.example.com/svn/myproject/trunk"
However, putting this line in the capistrano recipe only lets you deploy from trunk. We needed the ability to deploy either the trunk or a tag of our choice. We generally deploy the trunk to development servers and the latest tag to staging and production servers.
We started out with something more complicated, but with the help of Jamis Buck on the capistrano mailing list, we came up with the following solution:
set :repository_root, "http://www.example.com/svn/myproject" set(:tag) { Capistrano::CLI.ui.ask("Tag to deploy (or type 'trunk' to deploy from trunk): ") } set(:repository) { (tag == "trunk") ? "#{repository_root}/trunk" : "#{repository_root}/tags/#{tag}" }
This deploy script will prompt the user to enter either a tag name or the word trunk. It will then use that variable to set the repository to the correct path. The output of a deploy will look like:
% cap deploy * executing `deploy' ... * executing `deploy:update' ** transaction: start * executing `deploy:update_code' Tag to deploy (or type 'trunk' to deploy from trunk): trunk * executing "svn checkout -q -r2210 http://www.example.com/svn/myproject/trunk /var/www/myproject/releases/20081029012754 && (echo 2210 > /var/www/myproject/releases/20081029012754/REVISION)" ...
Capistrano evaluates variables lazily. It will only fetch the repository variable if it needs it, which will then fetch the tag variable, which will then prompt the user. Therefore, if you run a command that does not require the repository, it will not prompt. For example, running the following command will not prompt the user:
cap deploy:restart
Next, we created a convenience rake task to deploy the trunk without prompting:
namespace :deploy do task :trunk do sh "cap -s tag=trunk deploy" end end
This rake task sets the tag variable on the command line. Therefore, capistrano will not need to evaluate the set(:tag) command and will deploy the trunk without prompting.
Our current application includes a lot of static content created by content editors. They check in static HTML files, and we include these files in various parts of the application. The problem is that they sometimes copy and paste from applications such as Outlook or Word, which can introduce unprintable characters into the application. These characters show up strangely on the website.
After this happened a couple of times, we decided to write a test to ensure that we would always catch the unprintable characters:
class NonPrintableCharactersTest < Test::Unit::TestCase def test_for_non_printable_characters_in_content assert_equal "", `find #{RAILS_ROOT}/content -name '*.html' | xargs grep -n '[^[:space:][:print:]]'` end end
We use find to get a list of all of the html files in the content folder. Then, we pipe this to grep, using the regular expression
'[^[:space:][:print:]]'
which matches anything except spaces or printable characters. The output of this test looks like:
Loaded suite test/non_printable_characters_test Started F Finished in 0.86005 seconds. 1) Failure: test_for_non_printable_characters_in_content(NonPrintableCharactersTest) [test/non_printable_characters_test.rb:5]: <""> expected but was <"/some/path/to/content/tmp.html:48:character �</span></p>\n">. 1 tests, 1 assertions, 1 failures, 0 errors
The failure message shows the file and line with the character, so it is easy to fix.
The website I’m currently working on is similar to an online brochure. The data on the site changes hourly, but every user sees the same thing. As a result, we decided to use page caching to dramatically speed up the site. Once a page is visited, the html is written out to disk and all subsequent requests are served by apache. The setup of this approach is detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching Tutorial).
Setting up caching was easy, but we wanted to ensure that we did not make any mistakes. All pages should be cached, since any miss will result in a much higher load on our rails application. I’ve written previously about our internationalization test (Improved internationalization test) which spiders the site (using SpiderTest) looking for non localized text. Since we were already visiting every page, it seemed like a good place to add a check for page caching. Spidering the site again would make our test suite too long.
The consume page method is called for every page that is visited by the spider. We expanded the implementation by adding a call to assert_page_is_cached:
def consume_page(html, url) html.gsub!("http://www.example.com", "") unless redirect?(html) || asset?(url) assert_page_has_been_moved_to_language_file(html, url) assert_page_is_cached(url) super end def assert_page_is_cached(url) path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0] page = path.ends_with?(".html") ? path : "#{path}.html" assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})" end
We also had to add new lines to our setup to turn on caching (since it is normally off in test mode):
def setup FileUtils.rm_rf ActionController::Base.page_cache_directory ActionController::Base.perform_caching = true end
Since we run this test as its own suite, the test is totally isolated from other tests. There is no need to implement a teardown.
The full test, including the internationalization testing from before looks like:
require 'hpricot' class InternationalizationText < ActionController::IntegrationTest include Caboose::SpiderIntegrator def setup FileUtils.rm_rf ActionController::Base.page_cache_directory ActionController::Base.perform_caching = true blank_out_localization blank_out_html_escape end def blank_out_localization GLoc::InstanceMethods.class_eval do alias :old_l :l def l(symbol, *arguments) "" end end end def blank_out_html_escape ERB::Util.class_eval do alias :old_html_escape :html_escape def html_escape(s) "" end alias :h :html_escape end end def test_all_text_has_been_moved_to_language_file get '/' assert_response :success spider(@response.body, '/', :verbose => true) end def consume_page(html, url) html.gsub!("http://www.example.com", "") unless redirect?(html) || asset?(url) assert_page_has_been_moved_to_language_file(html, url) assert_page_is_cached(url) super end def redirect?(html) html.include?("<body>You are being") end def asset?(url) File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}")) end def assert_page_has_been_moved_to_language_file(page_text, url) doc = Hpricot.parse(page_text) assert_does_not_contain_words doc.at("title").inner_text, url body = doc.at('body') (body.search("//script[@type='text/javascript']")).remove assert_does_not_contain_words(body.inner_text, url) assert_attribute_does_not_contain_words body, url, 'title' assert_attribute_does_not_contain_words body, url, 'alt' end def assert_attribute_does_not_contain_words body, url, attribute body.search("//*[@#{attribute}]") do |element| assert_does_not_contain_words element.get_attribute(attribute), url end end def assert_does_not_contain_words text, url match = text.match(/[A-Za-z]([A-Za-z]| )*/) fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match end def assert_page_is_cached(url) path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0] page = path.ends_with?(".html") ? path : "#{path}.html" assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})" end end
I submitted a patch to Capistrano to add a “—dry-run” option (or -n for short). This flag causes capistrano to print out all of commands it will run without actually running them. It is an easy way to see what the cap task will do to your servers before you run it.
My patch was accepted and released as part of Capistrano 2.5.0. You can read more about the new features at:
http://capify.org/2008/8/29/capistrano-2-5-0
and see the details of my commit at github:
http://github.com/capistrano/capistrano/commit/7279a3858e2bcebe84735223d5f8b4397c4ad85b