18 Nov 04

Checking Up on the Competition

A simple HTML screen scraper is one of an automator’s best friends. You hook it up to a web page you want to monitor and walk away, knowing you’ll get notified if the web site crashes or gets sick.

The book includes a Unix shell script called checkurl.sh that scrapes a web page and sends out email if the page isn’t found or shouts bad words such as "Error" or "Exception". Steve Kellock contributed a Ruby screen scraper that I whittled into the shape of the original shell script for better portability:

  require 'open-uri'
  require 'net/smtp'

  if ARGV.empty? then
    puts "usage: check_url.rb <url>"
    exit 1

  url = ARGV[0]
  smtp_host = "your.smtp.host"
  to = "3035551212@mobile.att.net"
  from = "monitor@acme.com"
  subject="Uh oh!"
  message = ""

    page = open(url).read
    if page =~ /Error|Exception/
      message = "Error or Exception"
    message = "Unavailable - #{$!.message}"

  mail = <<MAIL
  Subject: #{subject}

  Sadly, #{url} isn't feeling well right now.

  Diagnosis: #{message}

  Net::SMTP.start(smtp_host) do |smtp|
    smtp.send_mail(mail, from, to)

To run it, just provide a URL you’d like checked. For example:

  ruby check_url.rb http://demoserver:8080/killerapp

Then schedule it to check the web page on a recurring interval using cron, at, or your favorite scheduler.

You can extend this simple technique to monitor all kinds of information across the web. Steve uses it to get a leg up on the competition. He sells a Windows-based desktop swimming organizer and he wants to know on a daily basis if any of his competitors have released a new version of any of their similar programs. Steve is a one-man show, but his Ruby scripts let him focus his time on the important stuff. One of his Ruby scripts visits the download page of each competitor’s site, scrapes version numbers, compiles a list, and sends Steve the results at midnight every night. Steve sums it up with this great quote:

Life is better with automation. It increases the number of plates you can keep spinning by tenfold.