Finding nonprintable characters with a test

written by paul on September 30th, 2008 @ 07:26 PM

Our current application includes a lot of static content created by content editors. They check in static HTML files, and we include these files in various parts of the application. The problem is that they sometimes copy and paste from applications such as Outlook or Word, which can introduce unprintable characters into the application. These characters show up strangely on the website.

After this happened a couple of times, we decided to write a test to ensure that we would always catch the unprintable characters:


class NonPrintableCharactersTest < Test::Unit::TestCase
  def test_for_non_printable_characters_in_content
    assert_equal "", `find #{RAILS_ROOT}/content -name '*.html' | xargs grep -n '[^[:space:][:print:]]'`
  end
end

We use find to get a list of all of the html files in the content folder. Then, we pipe this to grep, using the regular expression

'[^[:space:][:print:]]'
which matches anything except spaces or printable characters. The output of this test looks like:


Loaded suite test/non_printable_characters_test
Started
F
Finished in 0.86005 seconds.

  1) Failure:
test_for_non_printable_characters_in_content(NonPrintableCharactersTest) [test/non_printable_characters_test.rb:5]:
<""> expected but was
<"/some/path/to/content/tmp.html:48:character �</span></p>\n">.

1 tests, 1 assertions, 1 failures, 0 errors

The failure message shows the file and line with the character, so it is easy to fix.

Comments

  • Thom Parkin on 01 Oct 15:38

    Simply BRILLIANT!! This illustrates the 'magic' of Ruby that makes it so much fun.
  • Mark on 02 Oct 12:36

    @Thom: This is actually a shell command wrapped in a Ruby test. It can be adapted to any scripting language.

Post a comment

Options:

Size

Colors