Aug 162011
 

Here is part 4 of my useful unix tricks series:

Use mount -o bind to remount a filesystem

When you mount a filesystem in linux, anything that was in the existing directory is now hidden and out of reach. For example:

% touch /mnt/foo
% ls /mnt
./  ../  foo

% sudo mount /dev/sdb1 /mnt
% ls /mnt
./  ../

If you want access to those files again, you can remount / using mount -o bind:

% sudo mkdir /mnt2
% sudo mount -o bind / /mnt2
% ls /mnt2/mnt/foo
./  ../  foo

This command remounts / under /mnt2, so you can access any file through /mnt2.

less -R shows colors for control characters

Programs like Rails write log output with terminal coloring information. If you open these files in less, you’ll see something like:

% less development.log

ESC[4;36;1mSQL (0.2ms)ESC[0m   ESC[0;1mSET client_min_messages TO 'panic'ESC[0m
ESC[4;35;1mSQL (0.1ms)ESC[0m   ESC[0mSET client_min_messages TO 'notice'ESC[0m

If you add the -R flag to less, it tells less to output raw control characters, which will color your terminal:

% less -R development.log
SQL (0.2ms)   SET client_min_messages TO 'panic'
SQL (0.1ms)   SET client_min_messages TO 'notice'

Pressing alt+. will put in the previous command’s last argument

Pressing alt+. (the alt key and period) will drop the argument from the previous command into the terminal. For example:

% mv foo.txt bar.txt
% cat <press alt and .>

# this becomes
% cat bar.txt

This is especially useful for combo commands, such as unstaging and checking out a file in git:

% git st
# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#       modified:   foo
#

% git reset HEAD foo
Unstaged changes after reset:
M       foo

% git co <alt+.>

This is part of the emacs key bindings, and you can read more about them on stack overflow: bash – first argument of the previous command

Show git branch info in the prompt with coloring

I do a lot of development with git, and I have a lot of different git repositories. It’s handy to show git info in the prompt when I cd into a repo directory. At a minimum, I like seeing the current branch I have checked out.

For example, my prompt looks like:

host:~% cd repo
host:~/repo (master)%
host:~/repo (master)% git co -b some_branch
Switched to a new branch 'some_branch'
host:~/repo (some_branch)%

In zsh, getting a prompt like this is as simple as adding this to your .zshrc (or .zshenv) file:

git_prompt_info() {
  ref=$(/usr/local/bin/git symbolic-ref HEAD 2> /dev/null) || return
  echo " (${ref#refs/heads/})"
}

export PROMPT='%m:%~$(git_prompt_info)%# '

I like colors in my prompt as well, so my .zshenv looks like:

autoload -U colors
colors

git_prompt_info() {
  ref=$(/usr/local/bin/git symbolic-ref HEAD 2> /dev/null) || return
  echo " (${ref#refs/heads/})"
}

export PROMPT='%{$fg_bold[green]%}%m:%{$fg_bold[blue]%}%~%{$fg_bold[green]%}$(git_prompt_info)%{$reset_color%}%# '

After I load colors, I can use %{$fg_bold[color_name]} to change the following text to that color.

Use tmux instead of screen

For those that aren’t familiar with screen or tmux, here is the description from the tmux website:

tmux is a terminal multiplexer: it enables a number of terminals (or windows), each running a separate program, to be created, accessed, and controlled from a single screen. tmux may be detached from a screen and continue running in the background, then later reattached.

Being able to detach and reattach is really nice when working on remote servers. You can ssh to a server, start tmux, run a long running process and then detach. Hours later, you can ssh to the server again and reattach to the tmux window. The program continued running after you detached, and you can see the output as if you never left.

tmux is also great for working with others remotely. Everyone can ssh into one server and connect to the same tmux. Then, everyone can see the same file, commands, etc. This is often how we work together on deploys at Braintree Payments.

If you currently use screen, considering switching to tmux. They are very similar (both support everything I mentioned above), but tmux adds a few nice features:

  1. tmux allows you to split the screen vertically so you can have two windows side by side. This is great for working on servers. You can tail the log in one split, and work in the other so you can keep an eye on the logs. Screen lets you split horizontally, but I find vertical splits more useful on widescreen monitors since there is not as much vertical real estate.
  2. tmux adds a customizable status bar to the bottom of the terminal with information like which server you are on, the current time, tabs, etc.
  3. If different people all connect to the same tmux, it will adjust the viewable area to be the smallest window size. Larger monitors will see a grayed out background filling in the extra space. In contrast, screen might put output where the smaller windows cannot see it.

column -t will reformat input into columns

Use column -t when printing tabular output where the columns do not align. For example:

% print "one two\nthree four"
one two
three four

% print "one two\nthree four" | column -t
one    two
three  four

I use this all the time with the program pg_lsclusters, which prints out postgresql cluster information:

% pg_lsclusters
Version Cluster   Port Status Owner    Data directory                     Log file
8.4     backup_main 5433 down   postgres /var/lib/postgresql/8.4/backup_main /var/log/postgresql/postgresql-8.4-backup_main.log
8.4     main      5432 online postgres /var/lib/postgresql/8.4/main       /var/log/postgresql/postgresql-8.4-main.log

% pg_lsclusters | column -t
Version  Cluster      Port  Status  Owner     Data                                 directory                                           Log  file
8.4      backup_main  5433  down    postgres  /var/lib/postgresql/8.4/backup_main  /var/log/postgresql/postgresql-8.4-backup_main.log
8.4      main         5432  online  postgres  /var/lib/postgresql/8.4/main         /var/log/postgresql/postgresql-8.4-main.log

ctrl+p and ctrl+n cycle through terminal history

If you want to run a previous command in terminal, it’s common to press the up and down arrows to cycle through old commands. You can also use ctrl+p for previous and ctrl+n for next. I prefer these keys since it leaves my hands near the home row rather than having to reach for the arrow keys.

Jul 152011
 

I used to host my blog with Mephisto. Mephisto is blogging software written in Ruby on Rails, and I really enjoyed it. Unfortunately, the Mephisto project has been abandoned: http://techno-weenie.net/2010/6/23/you-can-let-go-now

While it was still running fine for me, I wanted to make changes to my blog, and I figured it was time to move to software that is still active.

After searching around, I decided to move to WordPress. My main reasons were:

  1. Existing script to migrate all of my posts and comments from Mephisto.
  2. Lots and lots of useful plugins.
  3. Widely used and actively developed.

I used a project on github called mephisto-to-wxr to export my Mephisto content as a WordPress export file. The script was a little outdated, so I forked and got it working with with my setup. You can check out my fork at: https://github.com/pgr0ss/mephisto-to-wxr

The syntax highlighting plugin I used with Mephisto required tags around code that looked like:

<pre><code class="ruby">
</code></pre>

In WordPress, I’m using a plugin called WP-Syntax which requires tags like:

<pre lang="ruby">
</pre>

To get the required output, I ran the WordPress export (which is xml) through the following sed command:

sed 's/<pre><code class=/<pre lang=/g' pgrs.net.wxr | sed 's/<\/code><\/pre>/<\/pre>/g' > pgrs.net.syntax_fix.wxr

For those interested, here is my current setup in WordPress:

PgEast Talk

 Uncategorized  Comments Off
Mar 152011
 

I’m giving a talk at the PgEast conference next week in NYC titled Migrating from MySQL to PostgreSQL.

This talk will cover how Braintree migrated our credit card payment gateway, a Ruby on Rails application with millions of rows of data, from MySQL to PostgreSQL. The talk will be in three main parts:

  1. Why we decided to switch and the benefits of PostgreSQL over MySQL
  2. The tools and process we used to cut over with only a couple of minutes of downtime
  3. The aftermath of the switch, rewriting parts of our code, and optimizing our application for PostgreSQL
Mar 282010
 

In our web app, we have a common UI pattern: replacing select lists (eg. drop downs) with a label if there is only one item in the list. For example, when creating a subscription, the user must choose a plan. Normally, there is a list of plans to choose from. However, if there is only one plan that the user can choose, we show a label specifying the one plan they’re getting instead of showing them a list of one.

We implemented this with a custom form builder and a method called select_or_label. It takes the same arguments as select, but only creates the select list if the list of choices has more than one element.

The view looks like:

<% form_for :subscription, :builder => SelectFormBuilder, :url => { :controller => :subscriptions, :action => :create } do |form| -%>
  <%= form.label :plan %>
  <%= form.select_or_label :plan_id, Plan.all.collect {|p| [p.name, p.id]} %>
<% end -%>

The form builder creates a span and a hidden_field if the size of the choices is 1:

class SelectFormBuilder < ActionView::Helpers::FormBuilder
  include ActionView::Helpers::TagHelper
 
  def select_or_label(method, choices, options = {})
    if choices.size == 1
      content_tag(:span, choices.first.first) +
        hidden_field(method, :value => choices.first.last)
    else
      select(method, choices, options)
    end
  end
end

And the spec:

require File.dirname(__FILE__) + '/../spec_helper'
 
describe SelectFormBuilder, :type => :helper do
  describe "select_or_label" do
    before do
      helper = Class.new { include ActionView::Helpers }.new
      @builder = SelectFormBuilder.new(:subscription, Subscription.new, helper, {}, nil)
    end
 
    it "returns a span and hidden field if the size of the choices array is only one" do
      html = @builder.select_or_label(:plan_id, [["name", "id"]])
      html.should_not have_tag("select")
 
      html.should have_tag("span", "name")
      html.should have_tag("input[type=hidden][name=?][value=?]", "subscription[plan_id]", "id")
    end
 
    it "returns a select if the size of the choices array is greater than one" do
      html = @builder.select_or_label(:plan_id, [["name", "id"], ["other name", "other_id"]])
      html.should_not have_tag("input")
 
      html.should have_tag("select[name=?]", "subscription[plan_id]") do
        with_tag("option[value=?]", "id", "name")
        with_tag("option[value=?]", "other_id", "other name")
      end
    end
  end
end
Feb 282010
 

Update (3/2/10): Updated code to work with version 0.1.30 of node.js

I’ve continued to play with node.js, and I’ve decided to do a follow up spike to my previous one: Web proxy in node.js for high availability

The previous spike used node to proxy requests directly to a web server. This spike uses node to put messages into a (redis) queue. Ruby background workers read from the queue, process the requests, and respond on a different queue. When node receives the response from the background worker, it sends the response back to the waiting user.

Just like my first spike, this type of architecture can be used for high availability web sites. Since all messages go into a queue and node holds the connections from the users, the site can be upgraded (including database migrations or infrastructure changes) as long as node and redis stay the same. Once the upgrade is finished, the workers can resume working from the queue. Users would see an extra long request, but as long as the upgrade was short (eg, less than a minute), the user should not know the site was down.

A queue has a lot of advantages over a straight proxy:

  1. Easy to scale up and down by adding or removing workers
  2. Can use priority queues to prioritize more important web requests
  3. Easy to monitor (eg, how many messages are in the queue, how fast are they being added)

Here is a very simple version of the code. First, the node webserver (using redis-node-client):

var sys = require('sys'),
   http = require('http'),
  redis = require("./redisclient");
 
var queuedRes = {}
var counter = 1;
 
http.createServer(function (req, res) {
  pushOnQueue(req, res);
}).listen(8000);
 
function pushOnQueue(req, res) {
  requestNumber = counter++;
 
  message = JSON.stringify({
    "class": "RequestProcessor",
    "args": [ {"node_id": requestNumber, "url": req.url} ]
  });
 
  client.rpush('resque:queue:requests', message);
  queuedRes[requestNumber] = res
}
 
var client = new redis.Client();
client.connect(function() {
  popFromQueue();
});
 
function popFromQueue() {
  client.lpop('responses', handleResponse);
}
 
function handleResponse(err, result) {
  if (result == null) {
    setTimeout(function() { popFromQueue(); }, 100);
  } else {
    json = JSON.parse(result);
    requestNumber = json.node_id;
    body = unescape(json.body);
    res = queuedRes[requestNumber];
    res.writeHeader(200, {'Content-Type': 'text/plain'});
    res.write(body);
    res.close();
    delete queuedRes[requestNumber];
    popFromQueue();
  }
}
 
sys.puts('Server running at http://127.0.0.1:8000/');

Also available as a gist.

pushOnQueue() is called on incoming web requests. This creates a JSON message and pushes it on the resque:queue:requests queue. It also puts the res object into a hash so it can be retrieved again on the way back.

At the same time, a queue listener is set up using redis.Client(). On
connect, popFromQueue() is called. This method pops messages from the
responses queue and calls handleResponse(). If the pop did not find a
message, it is scheduled to call again in 100 milliseconds. If it did find a
message, the message is parsed with JSON, the requestNumber is pulled out, and
the original res object is pulled out of the queuedRes hash. The res object is then
sent the body of the message from the queue, which makes it back to the user.

On the other side, I have a ruby worker using
resque:

class RequestProcessor
  @queue = :requests
 
  APP = Rack::Builder.new do
    use Rails::Rack::Static
    use Rack::CommonLogger
    run ActionController::Dispatcher.new
  end
 
  RACK_BASE_REQUEST = {
    "PATH_INFO" =&gt; "/things",
    "QUERY_STRING" =&gt; "",
    "REQUEST_METHOD" =&gt; "GET",
    "SERVER_NAME" =&gt; "localhost",
    "SERVER_PORT" =&gt; "3000",
    "rack.errors" =&gt; STDERR,
    "rack.input" =&gt; StringIO.new(""),
    "rack.multiprocess" =&gt; true,
    "rack.multithread" =&gt; false,
    "rack.run_once" =&gt; false,
    "rack.url_scheme" =&gt; "http",
    "rack.version" =&gt; [1, 0],
  }
 
  def self.perform(hash)
    url = hash.delete("url")
 
    request = RACK_BASE_REQUEST.clone
    request["PATH_INFO"] = url
    response = APP.call(request)
 
    body = ""
    response.last.each { |part| body &lt;&lt; part }
 
    hash["body"] = URI.escape(body)
    cmd = "redis-cli rpush responses #{hash.to_json.inspect}"
    system cmd
  end
end

Also available as a gist.

The worker can be started with:

env QUEUE=requests INTERVAL=1 rake environment resque:work

This worker uses resque, which polls the queue and calls perform when a
message is received. The perform method builds the Rack request and runs the URL from
the message through Rails. It then pushes the response body onto the
responses queue using redis-cli.

As before, this spike only works with GET requests and does not pass any headers through to keep the code simple. Comments and forks are welcome.

Feb 012010
 

Update (3/8/10): Updated code to work with version 0.1.30 of node.js

I’ve been thinking about high availability websites lately. In particular, I want sites that can be upgraded (including database migrations or even infrastructure changes) without downtime.

I’ve also been playing with node.js lately, and I decided to spike out a web proxy that would sit between users and the actual website (eg, a rails app). When performing upgrades, the proxy would hold users connections and wait. Once the upgrade was done, the proxy would forward requests as usual. Users would see an extra long request, but as long as the upgrade was short (eg, less than a minute), the user should not know the site was down.

This type of proxy server seems like a good fit with node. Node’s event model means that there will be very little overhead when holding connections. There are no threads stacking up and waiting. Since everything is non-blocking, this server should scale well.

Here is a very simple version of the code:

var fs = require('fs'),
   sys = require('sys'),
  http = require('http');
 
http.createServer(function (req, res) {
  checkBalanceFile(req, res);
}).listen(8000);
 
function checkBalanceFile(req, res) {
  fs.stat("balance", function(err) {
    if (err) {
      setTimeout(function() {checkBalanceFile(req, res)}, 1000);
    } else {
      passThroughOriginalRequest(req, res);
    }
  });
}
 
function passThroughOriginalRequest(req, res) {
  var request = http.createClient(2000, "localhost").request("GET", req.url, {});
  request.addListener("response", function (response) {
    res.writeHeader(response.statusCode, response.headers);
    response.addListener("data", function (chunk) {
      res.write(chunk);
    });
    response.addListener("end", function () {
      res.close();
    });
  });
  request.close();
}
 
sys.puts('Server running at http://127.0.0.1:8000/');

Here is a gist if anyone would like to fork.

Basically, I use http.createServer to create a web server on port 8000. On incoming requests, I call checkBalanceFile. This method will try to stat a local file called balance. If it finds it, it will call passThroughOriginalRequest, which forwards the request to another web server on port 2000. If the balance file does not exist, I use setTimeout to call checkBalanceFile again in one second.

With a proxy server like this, the main application can be upgraded by removing the balance file. While the file is missing, the node web server will hold all of the connections and check every second for the reappearance of the balance file. Once it comes back, all requests will be forwarded along and then streamed back to the user.

Currently, this spike only works with GET requests and does not pass any headers through, since I wanted to keep the code simple.

Jan 152010
 

The rake_commit_tasks plugin now has preliminary support for git. rake_commit_tasks is a rails plugin which contains a set of rake tasks for checking your project into source control (git or subversion).

The workflow for committing and pushing with git is slightly different from subversion. The current steps of “rake commit” with git are roughly:

  1. Resets soft back to origin/branch_name (git reset—soft origin/branch_name)
  2. Adds new files to git and removes deleted files (git add -A .)
  3. Prompts for a commit message
  4. Commits to git (git commit -m ’…’)
  5. Pulls changes from origin and does a rebase to keep a linear history (git pull—rebase)
  6. Runs the default rake task (rake default)
  7. Checks cruisecontrol.rb to see if the build is passing
  8. Pushes the commit to origin (git push origin branch_name)

The “git reset—soft” in #1 is used to collapse unpushed commits. Each time “rake commit” is run, any commits that have not been pushed are undone and the changes are put into the index. Then, the “git add -A .” adds the new changes. Now, the “git commit” command will create one commit with all of the unpushed changes.

This collapsing comes in handy when “rake commit” fails (for example, a broken test). Once the test is fixed, the fix should go into the same commit as the original work. Without the “git reset” command, there will be two commits (the original, and the one with the fix).

The “—rebase” flag is used in #5 when running “git pull” to keep a linear history without merge commits. If someone else has committed and pushed, a normal “git pull” will create a merge commit merging the other person’s work with your own. The “git pull—rebase” undoes the local commit, does a “git pull” and then replays the local commit on top. Merge commits are useful when there are multiple streams of work, such as a release branch. However, when everyone is working in master, they merely clutter the history.

Comments and patches are welcome.

Feb 202009
 

Update (8/16/11): Check out Useful unix tricks – part 4

Here is part 3 of Useful unix tricks and Useful unix tricks – part 2.

!! is the previous command in the shell history

It is pretty common to want to rerun the previous command, possibly with something new on the beginning or end. !! is that command in the history. For example:

% tail foo
tail: cannot open `foo' for reading: Permission denied

% sudo !!
sudo tail foo
hello world

As you can see, I forgot to sudo the first command. Now, I want to rerun it with a sudo at the front, so I can just do “sudo !!” and press enter. The shell will print out the command it is running, followed by whatever it would print normally.

Tail multiple files at once

The tail command can take multiple files, and it will show the output of each one. You can combine this with the -f flag, and tail will intersperse the output of each file in real time. This is incredibly handy for looking at log files. For example, we can tail both the apache and rails logs to see the requests:

==> log/production.log <==

Processing MephistoController#dispatch (for 127.0.0.1 at 2009-02-20 13:33:31) [GET]
  Parameters: {"action"=>"dispatch", "path"=>["2008", "7", "19", "capistrano-with-pairing-stations"], "controller"=>"mephisto"}
Completed in 784ms (View: 0, DB: 260) | 200 OK [http://www.pgrs.net/2008/7/19/capistrano-with-pairing-stations]

==> /var/log/apache2/access.log <==
127.0.0.1 - - [20/Feb/2009:13:33:31 -0600] "GET /2008/7/19/capistrano-with-pairing-stations HTTP/1.1" 200 16049 "http://www.pgrs.net/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6 Ubiquity/0.1.5"

As you can see, tail prints ==> <== to show which file the output is for.

Use vim -b to show nonprintable characters

Sometimes a file will have nonprintable characters, such as windows line breaks. Most editors won’t show them, but you can use “vim -b” to see and edit them. The -b flag tells vim to use binary mode. For example, here is a file with windows line endings:

% cat foo.txt
Hello
World

% vim -b foo.txt
Hello^M
World^M
^M

As you can see, the vim binary mode can see the line endings whereas cat cannot.

** is a recursive wildcard in zsh

You can use the recursive wildcard ** in zsh to do complex matching. For example, let’s say that you want to search all ruby files in the current project for the string RAILS_ENV. Normally, you would do something like:

% find . -name '*.rb' | xargs grep RAILS_ENV
./config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
./test/test_helper.rb:ENV["RAILS_ENV"] = "test"

In zsh, you can accomplish the same with a much simpler command:

% grep RAILS_ENV **/*.rb
config/environment.rb:# ENV['RAILS_ENV'] ||= 'production'
test/test_helper.rb:ENV["RAILS_ENV"] = "test"

The wildcard **/*.rb recursively matches any files that end in .rb, so there is no need for a find command.

If there are a lot of files, you will occasionally get the error:

% grep RAILS_ENV **/*.rb
zsh: argument list too long: grep

This means that the **/*.rb match returned too many arguments to handle. In this case, you can use echo and xargs to get the job done, which is still simpler than the find command:

% echo **/*.rb | xargs grep RAILS_ENV

find -X will show bad filenames

It is pretty command to run find and then pass the arguments into xargs. However, if any filenames contain spaces or quotes, xargs will fail. You can use find -X to find any paths that will fail. find will warn on these paths and then skip them:

% find -X .
.
find: ./filename with spaces: illegal path

If you want to use xargs with these files, use the -print0 option to tell find to use a NUL character instead of a space, and xargs -0 to tell xargs to parse on NUL instead of space:

% find . -print0 | xargs -0 echo
. ./filename with spaces

cd – will return to the previous folder

Passing – into the cd command will return you to the last folder you were in:

/tmp% pwd
/tmp

/tmp% cd ~

~% cd -
/tmp

/tmp%

Use ctrl+z and kill %1 to kill a process that will not die

Sometimes, you run a command and pressing ctrl+c will not kill it. When that happened, I use to open up another terminal window to kill -9 the process until someone showed me the following trick:

% sleep 1000
^Z
zsh: suspended  sleep 1000
% kill -9 %1
%
[1]  + killed     sleep 1000

Pressing ctrl+z suspends the process and returns you to a terminal prompt. Then, kill -9 %1 sends the kill -9 signal to job #1, which is our suspended process.

pwdx shows the working directory of a process

It can be really useful to see the working directory of a running process. For example, you can see which release a ruby process is running:

% sudo pwdx 23961
23961: /var/www/myapp/releases/20081231200733

Unfortunately, I haven’t found pwdx for the mac. If anyone knows how I can install it, please let me know.

Use sh -x to debug shell scripts

If you want to know what commands a shell script runs, run it with the -x flag. For example, say we have a shell script with two echos. Compare the output with and without the -x flag:

% sh foo.sh
hello
world

% sh -x foo.sh
+ echo hello
hello
+ echo world
world

% zsh -x foo.sh
+foo.sh:1> echo hello
hello
+foo.sh:2> echo world
world

As you can see, the -x flag shows which command is being run. zsh takes it a step farther and shows the script and line number as well.

sysctl replaces /proc on macs

The /proc filesystem is a great way to find out about a linux machine. For example, you can “cat /proc/cpuinfo” to find out how many processors are on the box. However, macs don’t have /proc. You can use sysctl instead. The -a flag prints out all keys and values:

% sysctl -a
kern.ostype = Darwin
kern.osrelease = 9.6.0
kern.osrevision = 199506
kern.version = Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386
...

You can also just get a single value with the -n flag. For example, this command will print out the number of cpu cores:

% sysctl -n hw.ncpu
2