Feb 202009
 

Update (8/16/11): Check out Useful unix tricks – part 4

Here is part 3 of Useful unix tricks and Useful unix tricks – part 2.

!! is the previous command in the shell history

It is pretty common to want to rerun the previous command, possibly with something new on the beginning or end. !! is that command in the history. For example:

% tail foo
tail: cannot open `foo' for reading: Permission denied

% sudo !!
sudo tail foo
hello world

As you can see, I forgot to sudo the first command. Now, I want to rerun it with a sudo at the front, so I can just do “sudo !!” and press enter. The shell will print out the command it is running, followed by whatever it would print normally.

Tail multiple files at once

The tail command can take multiple files, and it will show the output of each one. You can combine this with the -f flag, and tail will intersperse the output of each file in real time. This is incredibly handy for looking at log files. For example, we can tail both the apache and rails logs to see the requests:

==> log/production.log <==

Processing MephistoController#dispatch (for 127.0.0.1 at 2009-02-20 13:33:31) [GET]
  Parameters: {"action"=>"dispatch", "path"=>["2008", "7", "19", "capistrano-with-pairing-stations"], "controller"=>"mephisto"}
Completed in 784ms (View: 0, DB: 260) | 200 OK [http://www.pgrs.net/2008/7/19/capistrano-with-pairing-stations]

==> /var/log/apache2/access.log <==
127.0.0.1 - - [20/Feb/2009:13:33:31 -0600] "GET /2008/7/19/capistrano-with-pairing-stations HTTP/1.1" 200 16049 "http://www.pgrs.net/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6 Ubiquity/0.1.5"

As you can see, tail prints ==> <== to show which file the output is for.

Use vim -b to show nonprintable characters

Sometimes a file will have nonprintable characters, such as windows line breaks. Most editors won’t show them, but you can use “vim -b” to see and edit them. The -b flag tells vim to use binary mode. For example, here is a file with windows line endings:

% cat foo.txt
Hello
World

% vim -b foo.txt
Hello^M
World^M
^M

As you can see, the vim binary mode can see the line endings whereas cat cannot.

** is a recursive wildcard in zsh

You can use the recursive wildcard ** in zsh to do complex matching. For example, let’s say that you want to search all ruby files in the current project for the string RAILS_ENV. Normally, you would do something like:

% find . -name '*.rb' | xargs grep RAILS_ENV
./config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
./test/test_helper.rb:ENV["RAILS_ENV"] = "test"

In zsh, you can accomplish the same with a much simpler command:

% grep RAILS_ENV **/*.rb
config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
test/test_helper.rb:ENV["RAILS_ENV"] = "test"

The wildcard **/*.rb recursively matches any files that end in .rb, so there is no need for a find command.

If there are a lot of files, you will occasionally get the error:

% grep RAILS_ENV **/*.rb
zsh: argument list too long: grep

This means that the **/*.rb match returned too many arguments to handle. In this case, you can use echo and xargs to get the job done, which is still simpler than the find command:

% echo **/*.rb | xargs grep RAILS_ENV

find -X will show bad filenames

It is pretty command to run find and then pass the arguments into xargs. However, if any filenames contain spaces or quotes, xargs will fail. You can use find -X to find any paths that will fail. find will warn on these paths and then skip them:

% find -X .
.
find: ./filename with spaces: illegal path

If you want to use xargs with these files, use the -print0 option to tell find to use a NUL character instead of a space, and xargs -0 to tell xargs to parse on NUL instead of space:

% find . -print0 | xargs -0 echo
. ./filename with spaces

cd – will return to the previous folder

Passing – into the cd command will return you to the last folder you were in:

/tmp% pwd
/tmp

/tmp% cd ~

~% cd -
/tmp

/tmp%

Use ctrl+z and kill %1 to kill a process that will not die

Sometimes, you run a command and pressing ctrl+c will not kill it. When that happened, I use to open up another terminal window to kill -9 the process until someone showed me the following trick:

% sleep 1000
^Z
zsh: suspended  sleep 1000
% kill -9 %1
%
[1]  + killed     sleep 1000

Pressing ctrl+z suspends the process and returns you to a terminal prompt. Then, kill -9 %1 sends the kill -9 signal to job #1, which is our suspended process.

pwdx shows the working directory of a process

It can be really useful to see the working directory of a running process. For example, you can see which release a ruby process is running:

% sudo pwdx 23961
23961: /var/www/myapp/releases/20081231200733

Unfortunately, I haven’t found pwdx for the mac. If anyone knows how I can install it, please let me know.

Use sh -x to debug shell scripts

If you want to know what commands a shell script runs, run it with the -x flag. For example, say we have a shell script with two echos. Compare the output with and without the -x flag:

% sh foo.sh
hello
world

% sh -x foo.sh
+ echo hello
hello
+ echo world
world

% zsh -x foo.sh
+foo.sh:1> echo hello
hello
+foo.sh:2> echo world
world

As you can see, the -x flag shows which command is being run. zsh takes it a step farther and shows the script and line number as well.

sysctl replaces /proc on macs

The /proc filesystem is a great way to find out about a linux machine. For example, you can “cat /proc/cpuinfo” to find out how many processors are on the box. However, macs don’t have /proc. You can use sysctl instead. The -a flag prints out all keys and values:

% sysctl -a
kern.ostype = Darwin
kern.osrelease = 9.6.0
kern.osrevision = 199506
kern.version = Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386
...

You can also just get a single value with the -n flag. For example, this command will print out the number of cpu cores:

% sysctl -n hw.ncpu
2
Feb 202009
 

The rake_commit_tasks plugin now supports automatically merging changes from branch to trunk. I describe the feature and the use case at Automatically merge changes from branch to trunk, although the merging code now uses “svn merge” instead of “svn diff” in order to keep svn mergeinfo.

Basically, if you branch to release code and then fix a bug on the branch, the change will automatically be merged over to the trunk when you run a “rake commit.” Just set PATH_TO_TRUNK_WORKING_COPY to the location of the trunk checkout in your Rakefile.

If you are curious, you can check out the commit at github.

Jan 102009
 

In Flight delay information for United flights, I talked about an application I wrote to show United flight delays over time. I have now completely rewritten that application to allow comparison of multiple flights on one graph.

People that travel for a living know that early morning flights tend to be less delayed than evening flights. In the morning, the planes are usually already at the airport, so there is no chance of an incoming flight delay. There are no lines of planes waiting to take off yet, so the time between leaving the gate and getting into the air tends to be a lot less.

The difference in these times can be dramatic. Here is a report comparing an early morning flight with an evening flight from Newark to Chicago (two heavily delayed airports): Flight Delays

The report shows a table of min, max, and median delays:

Flight Min Delay Median Delay Max Delay
Flight 655 (EWR -> ORD) on Thursday at 07:38 PM -14 40 239
Flight 635 (EWR -> ORD) on Thursday at 05:58 AM -26 0 157



And here are the two graphs shown in the report above:


As you can see from the first graph, day by day, flight 655 (departing around 7:38 PM) is almost always more delayed than flight 635 (departing around 5:58 AM).

The second graph shows a histogram. You can see that flight 635 is clustered more heavily to the left (-40 to 20) which shows that it is generally between 40 minutes early and 20 minutes late. Flight 655 is much more spread out to the right, which shows that it has far more delays. On one day, it was over 220 minutes late!

Dec 312008
 

I noticed the other day that methods defined using define_method have very strange behavior when given the wrong number of arguments. For example, here is a class with a bunch of methods defined using define_method:

class Foo
  define_method :no_args do
    p "no args"
  end
 
  define_method :one_arg do |one|
    p one
  end
 
  define_method :two_args do |one, two|
    p one
    p two
  end
end

Now, if we call no_args with an argument, it will silently ignore the argument:

>> Foo.new.no_args(1)
"no args"
=> nil

However, if we have a method that expects one argument but receives either none or more than one, we get a warning:

>> Foo.new.one_arg
./foo.rb:6: warning: multiple values for a block parameter (0 for 1)
    from (irb):3
nil
=> nil
 
>> Foo.new.one_arg(1,2,3)
./foo.rb:6: warning: multiple values for a block parameter (3 for 1)
    from (irb):2
[1, 2, 3]
=> nil

In the second case, it took all three arguments and passed them as an array into the method expecting one argument.

It gets even stranger with a method that expects two arguments. Now, we actually get errors:

>> Foo.new.two_args
ArgumentError: wrong number of arguments (0 for 2)
    from (irb):2:in 'two_args'
    from (irb):2
 
>> Foo.new.two_args(1,2,3)
ArgumentError: wrong number of arguments (3 for 2)
    from ./foo.rb:10:in 'two_args'
    from (irb):3

I’m not sure why a one argument method gives a warning while a two argument method gives an error. Clearly, define_method is very different from using def.

Dec 222008
 

I recently upgraded my blogging software, Mephisto, from 0.7.3 to 0.8.1. One thing I noticed is that they moved the cached files from public to a cache subfolder containing the site. For example, on a new installation, the cached index page is in public/cache/unusedfornow.com/index.html.

Mephisto writes a cached page for every page visited. This means that any subsequent requests for this page can be served directly by apache from the cached file rather than going through the whole rails stack (all the way down to the database). This is much faster and uses less memory.

I run my blog in Apache with Phusion Passenger. The problem with this new cache location is that Passenger only looks in public for cached files. This means that the cached pages are ignored and every request is being served by Rails. After searching google and working some mod_rewrite magic, I came up with the following solution. Here is the Apache virtual host configuration for my blog:

<VirtualHost *:80>
    ServerName pgrs.net
    ServerAlias www.pgrs.net
 
    DocumentRoot /var/www/mephisto-0.8.1/public
 
    RailsAllowModRewrite on
    RewriteEngine On
 
    # Rewrite / to index.html
    RewriteRule ^/$ /index.html [QSA]
 
    # Rewrite /some_page to /some_page.html
    RewriteRule ^([^.]+?)/?$ $1.html [QSA]
 
    # If cached file exists, serve it and stop processing
    RewriteCond %{DOCUMENT_ROOT}/cache/unusedfornow.com%{REQUEST_FILENAME} -f
    RewriteRule ^(.*)$ /cache/unusedfornow.com$1 [L]
 
    ErrorLog /var/log/apache2/pgrs-error.log
    CustomLog /var/log/apache2/pgrs-access.log combined
</VirtualHost>

The first 3 lines are standard Phusion Passenger configuration: Deploying a Ruby on Rails application. Then, I turn on mod_rewrite. The first two sets of mod_rewrite configuration cascade and turn the request into what the filename will look like. So / becomes /index.html, and /2008/10/29/deploying-trunk-or-tags-with-capistrano becomes /2008/10/29/deploying-trunk-or-tags-with-capistrano.html.

The final set checks if this file exists under /var/www/mephisto-0.8.1/public/cache/unusedfornow.com (the -f flag), and if it does, tells apache to serve this file. The [L] tells mod_rewrite that this is the last rule, so it should stop processing now. If the file does not exist, the request falls through mod_rewrite and Passenger picks it up and serves it through Rails.

I verified that this works by looking at the response headers in Firefox (Tools -> Page Info -> Headers) of any given blog page. The first time, there is a “X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 2.0.3” header. Once I refresh, the X-Powered-By header is gone since the request never makes it to Passenger. Apache is once again doing the hard work, and Rails is only used when the request is new or dynamic (such as searching).

Oct 292008
 

On my current project, we use capistrano for all of our deployments. In the simplest case, you tell capistrano the URL of your repository, and then you deploy by performing a checkout from this repository:

set :repository,  "http://www.example.com/svn/myproject/trunk"

However, putting this line in the capistrano recipe only lets you deploy from trunk. We needed the ability to deploy either the trunk or a tag of our choice. We generally deploy the trunk to development servers and the latest tag to staging and production servers.

We started out with something more complicated, but with the help of Jamis Buck on the capistrano mailing list, we came up with the following solution:

set :repository_root, "http://www.example.com/svn/myproject"
set(:tag) { Capistrano::CLI.ui.ask("Tag to deploy (or type 'trunk' to deploy from trunk): ") }
set(:repository) { (tag == "trunk") ? "#{repository_root}/trunk" : "#{repository_root}/tags/#{tag}" }

This deploy script will prompt the user to enter either a tag name or the word trunk. It will then use that variable to set the repository to the correct path. The output of a deploy will look like:

% cap deploy
  * executing `deploy'
...
  * executing `deploy:update'
 ** transaction: start
  * executing `deploy:update_code'
Tag to deploy (or type 'trunk' to deploy from trunk): trunk
  * executing "svn checkout -q  -r2210 http://www.example.com/svn/myproject/trunk /var/www/myproject/releases/20081029012754 && (echo 2210 > /var/www/myproject/releases/20081029012754/REVISION)"
...

Capistrano evaluates variables lazily. It will only fetch the repository variable if it needs it, which will then fetch the tag variable, which will then prompt the user. Therefore, if you run a command that does not require the repository, it will not prompt. For example, running the following command will not prompt the user:

cap deploy:restart

Next, we created a convenience rake task to deploy the trunk without prompting:

namespace :deploy do
  task :trunk do
    sh "cap -s tag=trunk deploy"
  end
end

This rake task sets the tag variable on the command line. Therefore, capistrano will not need to evaluate the set(:tag) command and will deploy the trunk without prompting.

Oct 012008
 

Our current application includes a lot of static content created by content editors. They check in static HTML files, and we include these files in various parts of the application. The problem is that they sometimes copy and paste from applications such as Outlook or Word, which can introduce unprintable characters into the application. These characters show up strangely on the website.

After this happened a couple of times, we decided to write a test to ensure that we would always catch the unprintable characters:

class NonPrintableCharactersTest &lt; Test::Unit::TestCase
  def test_for_non_printable_characters_in_content
    assert_equal "", `find #{RAILS_ROOT}/content -name '*.html' | xargs grep -n '[^[:space:][:print:]]'`
  end
end

We use find to get a list of all of the html files in the content folder. Then, we pipe this to grep, using the regular expression

'[^[:space:][:print:]]'

which matches anything except spaces or printable characters. The output of this test looks like:

Loaded suite test/non_printable_characters_test
Started
F
Finished in 0.86005 seconds.
 
  1) Failure:
test_for_non_printable_characters_in_content(NonPrintableCharactersTest) [test/non_printable_characters_test.rb:5]:
<""> expected but was
<"/some/path/to/content/tmp.html:48:character �</span></p>\n">.
 
1 tests, 1 assertions, 1 failures, 0 errors

The failure message shows the file and line with the character, so it is easy to fix.

Sep 122008
 

The website I’m currently working on is similar to an online brochure. The data on the site changes hourly, but every user sees the same thing. As a result, we decided to use page caching to dramatically speed up the site. Once a page is visited, the html is written out to disk and all subsequent requests are served by apache. The setup of this approach is detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching Tutorial).

Setting up caching was easy, but we wanted to ensure that we did not make any mistakes. All pages should be cached, since any miss will result in a much higher load on our rails application. I’ve written previously about our internationalization test (Improved internationalization test) which spiders the site (using SpiderTest) looking for non localized text. Since we were already visiting every page, it seemed like a good place to add a check for page caching. Spidering the site again would make our test suite too long.

The consume page method is called for every page that is visited by the spider. We expanded the implementation by adding a call to assert_page_is_cached:

def consume_page(html, url)
  html.gsub!("http://www.example.com", "")
  unless redirect?(html) || asset?(url)
    assert_page_has_been_moved_to_language_file(html, url)
    assert_page_is_cached(url)
  super
end
 
def assert_page_is_cached(url)
  path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
  page = path.ends_with?(".html") ? path : "#{path}.html"
  assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end

We also had to add new lines to our setup to turn on caching (since it is normally off in test mode):

def setup
  FileUtils.rm_rf ActionController::Base.page_cache_directory
  ActionController::Base.perform_caching = true
end

Since we run this test as its own suite, the test is totally isolated from other tests. There is no need to implement a teardown.

The full test, including the internationalization testing from before looks like:

require 'hpricot'
 
class InternationalizationText &lt; ActionController::IntegrationTest
  include Caboose::SpiderIntegrator
 
  def setup
    FileUtils.rm_rf ActionController::Base.page_cache_directory
    ActionController::Base.perform_caching = true
    blank_out_localization
    blank_out_html_escape
  end
 
  def blank_out_localization
    GLoc::InstanceMethods.class_eval do
      alias :old_l :l
      def l(symbol, *arguments)
        ""
      end
    end
  end
 
  def blank_out_html_escape
    ERB::Util.class_eval do
      alias :old_html_escape :html_escape
      def html_escape(s)
        ""
      end
 
      alias :h :html_escape
    end
  end
 
  def test_all_text_has_been_moved_to_language_file
    get '/'
    assert_response :success
    spider(@response.body, '/', :verbose =&gt; true)
  end
 
  def consume_page(html, url)
    html.gsub!("http://www.example.com", "")
    unless redirect?(html) || asset?(url)
      assert_page_has_been_moved_to_language_file(html, url)
      assert_page_is_cached(url)
    super
  end
 
  def redirect?(html)
    html.include?("&lt;body&gt;You are being")
  end
 
  def asset?(url)
    File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
  end
 
  def assert_page_has_been_moved_to_language_file(page_text, url)
    doc = Hpricot.parse(page_text)
    assert_does_not_contain_words doc.at("title").inner_text, url
    body = doc.at('body')
    (body.search("//script[@type='text/javascript']")).remove
    assert_does_not_contain_words(body.inner_text, url)
    assert_attribute_does_not_contain_words body, url, 'title'
    assert_attribute_does_not_contain_words body, url, 'alt'
  end
 
  def assert_attribute_does_not_contain_words body, url, attribute
    body.search("//*[@#{attribute}]") do |element|
      assert_does_not_contain_words element.get_attribute(attribute), url
    end
  end
 
  def assert_does_not_contain_words text, url
    match = text.match(/[A-Za-z]([A-Za-z]| )*/)
    fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
  end
 
  def assert_page_is_cached(url)
    path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
    page = path.ends_with?(".html") ? path : "#{path}.html"
    assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
  end
end
Sep 062008
 

I submitted a patch to Capistrano to add a “—dry-run” option (or -n for short). This flag causes capistrano to print out all of commands it will run without actually running them. It is an easy way to see what the cap task will do to your servers before you run it.

My patch was accepted and released as part of Capistrano 2.5.0. You can read more about the new features at:

http://capify.org/2008/8/29/capistrano-2-5-0

and see the details of my commit at github:

http://github.com/capistrano/capistrano/commit/7279a3858e2bcebe84735223d5f8b4397c4ad85b