Excessive Ferret Disk Usage

Posted on May 15, 2008

I’m a big fan of Ferret as a low-headache, low barrier to entry search/indexing utility for Rails apps.

Recently though one of my servers kept seeing its disk usage spike to 100% unexplicably. I checked the usual suspects, log/ and tmp/ and found nothing.

Clueless, I reached out for a lil help from the pros. Hat tip Lourens for suggesting Ferret as a possible culprit.

When I nuked the index/ dir for the site UmmYeah—disk usage dropped from 100% to 33%! Woot.

... currently rebuilding the Ferret index. Let’s hope, when rebuilt from scratch, the bad boy is much more compact.

7 Mongrels Serve 920,000 Pageviews in a Day

Posted on March 23, 2008
Summary (Incredimazing)

The site is Incredimazing (one of my pet projects) and the main page which attracted the web’s attention was Earth at Night (cool, eh).

This time the traffic came from Fark, reddit and various other sources.

This averages out to about 10.6 requests / second, but the traffic was very bursty—probably 600-700k of the pageviews came within a span of 4-5 hours when first hitting fark/reddit/etc.

Note: the complete setup that runs incredimazing + a bevy of other sites runs about $400. If my math from this post were to hold true, it would mean hardware costs for a site such as this could be as little as 0.5% (half a percent). But… but… rails doesn’t scale! :P

Btw – this post is an update to this article which covered the initial digging.

Backup_fu Tip: Clear out RAILS_ROOT/tmp/backup Periodically

Posted on March 16, 2008

Just a tip, because I’ve finally bumped into an issue with this on one of my servers.

Backup_fu first dumps files to RAILS_ROOT/tmp/backup. This is nice and all to have a local copy of the backup … but it never does any cleaning out.

If you are looking for a solution that handles cleaning up old backups, I believe you can Google around and there are some out there like that.

What I will probably do going forward is create a reminder for myself via Backpack or iCal each month to A) check on the backups in Amazon S3, and B) clean out tmp/backup in the rails apps, especially ones where the backups push 1, 2, 3 gigs a pop.

How to Handle Facebook App Uninstalls with RFacebook

Posted on February 20, 2008

I posted the following here—but thought I’d solicit other feedback via blog as well.

As of this writing, the rfacebook plugin does not seem to provide a way to handle uninstalls (easily).

Is there a method I’m not aware of? Or does one handle it pretty much how I’ve done below?

In this example, to handle uninstalls in your rails app we will use the post-remove URL of ‘foo.yourapp.com/uninstalled’.

In your application.rb, you probably have something like:

  before_filter :require_facebook_install
  before_filter :require_facebook_login

Change your application.rb to something like:

  before_filter :require_facebook_install, :except => [:uninstalled]
  before_filter :require_facebook_login, :except => [:uninstalled]

  # Before Filter on *only* the 'uninstalled' method
  before_filter :verify_uninstall_signature, :only => [:uninstalled]

  # Note: it's important this method is *above* the 'protected' definition, since it needs to be called directly
  def uninstalled
    @fb_uid = params[:fb_sig_user]
    # From here on it will be app specific -- given the facebook uid, destroy the user, like...
    @user = User.find_by_fb_uid(@fb_uid)
    @user.destroy if @user
    render :nothing => true; return
  end

  protected
   ...

Next, in your ‘protected’ section, add the following method which roughly corresponds to the PHP / pseudocode:

  def verify_uninstall_signature
    signature = ''
    keys = params.keys.sort
    keys.each do |key|
      next if key == 'fb_sig'
      next unless key.include?('fb_sig')
      key_name = key.gsub('fb_sig_', '')
      signature += key_name
      signature += '='
      signature += params[key]
    end
    signature += FACEBOOK['secret']
    calculated_sig = Digest::MD5.hexdigest(signature)
    #logger.info "\nUNINSTALL :: Signature (fb_sig param from facebook) :: #{params[:fb_sig]}" 
    #logger.info "\nUNINSTALL :: Signature String (pre-hash) :: #{signature}" 
    #logger.info "\nUNINSTALL :: MD5 Hashed Sig :: #{calculated_sig}" 
    if calculated_sig != params[:fb_sig]
      #logger.warn "\n\nUNINSTALL :: WARNING :: expected signatures did not match\n\n" 
      return false
    else
      #logger.warn "\n\nUNINSTALL :: SUCCESS!! Signatures matched.\n" 
    end
    return true
  end

This might be handy to add to ‘rfacebook’, but it sounds like the state of that project is in flux.

Also, you might have to add this entry to your config/routes.rb file:

  map.connect 'uninstalled', :controller => 'application', :action => 'uninstalled'

Happy uninstalling! :)

A few updates to backup_fu

Posted on February 18, 2008

I’d like to give a special thanks to those who commented on the release of backup_fu—especially for all of the positive feedback.

In this release, the following features were added:

- PostgreSQL dump support
- Ability to supply your own custom MySQL dump options (config key 'mysqldump_options')
- Backup_fu now checks your ENV variables for AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_ACCESS_KEY;  if found, it uses these instead of requiring them in config/backup_fu.yml.

Since this is a minor release, there’s probably no need to update if you’ve already got v1.0 working like a charm.

For those who haven’t played with backup_fu, click here for the full scoop.

Thanks also to Toby Hede and Ralph Churchill for submitting patches.

Toby’s added the ability to support user-supplied mysqldump options; Ralph’s for PostgreSQL dump support.

Also hat tip to nick for suggesting the ENV variable support.

Monkeypatching ActiveRecord so 'logger' Works Seamlessly

Posted on February 18, 2008

Ever tried calling “logger.info” from an ActiveRecord object, only to have rails lay the smack-down on ya?

Use ‘logger’ seamlessly throughout your app with this monkey patch:

module ActiveRecordExtensions

  def logger
    RAILS_DEFAULT_LOGGER
  end

end

class ActiveRecord::Base
  include ActiveRecordExtensions
end

Throw that in environment.rb or wherever you keep your monkey patches, and let the good times roll.

Tinkering with Git + Quick Facebook App Template

Posted on February 17, 2008

I just got my new GitHub account – yay!

Time to tinker. One thing I’ve done with rails is create a “QuickStart Rails” template that I have checked into svn. It’s got several useful plugins that I use in just about all my rails apps, plus a working authentication system including all the boring, trivial details that come with that like ‘Forgot Password?’ functionality, etc.

Not all apps get the QuickStart rails treatment, but if they involve user authentication, I usually bootstrap the app using QS.

Recently I’ve been working on a Facebook app; following its development, I’ve made a Quick FB rails application template.

Again, this does all of the mundane things that all Facebook apps will require, like an Invitation page:

Facebook | Maven (Development)


If you’re interested, you can browse the git repository online.

To bootstrap a new Quick FB app (“fb_foo”) using git, do a:

git clone git://github.com/sbraford/quick-fb.git fb_foo

This will give you a complete Rails 2.0 application structure, plus:

  • rfacebook, will_paginate plugins already added
  • application.rb configured for you
  • User, Friendship models already created—as users add the application, they are automatically associated as Friends within your app
  • Invitation widget (like shown above) already included
See config/environment.rb for the list of TODOs once you have cloned the Quick FB git repo:

TODO after cloning the quick-fb git repo:

  1. Create your app's database.  Modify 'config/database.yml'.

  2. Open 'config/facebook.yml' and modify the following:

      key: YOUR_API_KEY_HERE
      secret: YOUR_API_SECRET_HERE
      canvas_path: /yourAppName/
      callback_path: /relative_root/
      sever_host: foo.com
      app_name: Foo App

  3. Modify the relative URL root below.

  4. Open up 'app/helpers/application_helper.rb' and modify the invitation text.

  5. Generate a new secret app key with:

        rake secret

  Copy & paste this into the session config section above. (this might not be necessary for fb apps, but just in case)

Note: to port this over to SVN, I believe you can simply do a:

  rm -rf .git/
  rm .gitignore  # and any other .gitignore files

... and you should be good to go. (copy to an existing svn repo and svn add/import as usual)

backup_fu Makes Amazon S3 Backups Redonkulous

Posted on February 07, 2008

Redonkulously easy, that is.

Your rails app has been humming along nicely for several months, then suddenly it hits you, wait, are we backing up our database / uploaded files yet?

Well, of course you are. But for those who prefer to live on the edge, launching an app without a well-articulated backup strategy, I give you backup_fu.

It’s a rails plugin, invoked via rake tasks. After installing the plugin and modifying a few lines in a config file, you can have your application backing up both its database and static files to Amazon S3 within minutes.

Update: backup fu does now support PostgresSQL. See this post for the 411.

Installation

Grab the only dependency, the aws-s3 gem (if not installed already):

sudo gem install aws-s3

And the plugin:

ruby script/plugin install http://backup-fu.googlecode.com/svn/backup_fu/

Configuration

Generate the default config/backup_fu.yml file with:

rake backup_fu:setup

You’ll need to modify at least these four lines in config/backup_fu.yml:

# The app_name is used as the backup filename prefix
app_name: replace_me
# Note: please create this bucket (whatever yours may be) externally first:
s3_bucket: some-s3-bucket
aws_access_key_id: --replace me with your AWS access key id--
aws_secret_access_key: --replace me with your AWS secret access key--

If you’re on OS X, the excellent S3 Browser will have you creating S3 buckets, browsing them, and uploading/downloading to them within minutes.

Jets3t is a Java-based S3 browser and should do the trick on win32/etc systems.

Basic Usage

To dump your database (to RAILS_ROOT/tmp/backup):

rake backup_fu:dump

In production environments, of course, you may have to do something like:

RAILS_ENV=production rake backup_fu:dump

To dump your database, then send the tar/gzipped copy to Amazon S3, it’s as simple as:

rake backup

This will place a file named something like ‘foo_app_2008-02-07_db.tar.gz’ into your Amazon S3 bucket (as specified by s3_bucket from your config file).

Configuration Options

If you bump into any snafus, the first thing you should do is enable verbosity via the config file:

verbose: true

See vendor/plugins/backup_fu/config/backup_fu.yml.advanced_example for the list of advanced configuration options. ( view online )

The most common issue that one might run into is ‘mysqldump’ not being in the user path.

To solve this do a ‘locate mysqldump’ or otherwise find its absolute path, and specify it explicitly in config/backup_fu.yml:

mysqldump_path: /usr/local/mysql/bin/mysqldump

Also see the README for more on debugging snafus and advanced configuration options.

Static File Backups

Backing up your database is great and all, but what if users upload files too?

Let’s say you’ve got a directory RAILS_ROOT/public/static where all of these files reside.

In our fictional example, this directory is really a symlink to /apps/foo/static. So we specify this as the ‘static_paths’ key (again in config/backup_fu.yml):

static_paths: "/apps/foo/static" 

Let’s say we also wanted to backup ”/apps/foo/user_images”.

Multiple target static directories can be delimited via whitespace:

static_paths: "/apps/foo/static /apps/foo/user_images" 

First let’s try dumping these static directories (with full contents) into a tar/gzipped archive in RAILS_ROOT/tmp/backup:

rake backup_fu:static:dump

If that worked, let’s go for the full enchilada (dumping + uploading to S3) with:

rake backup_fu:static:backup

And for backing up both your database + static files (they will get uploaded as separate, distinct archives) in one rake command:

rake backup_fu:all

Phew.

Notes

While I had trouble with aws-s3 in its early days, I’ve since used aws-s3 (and backup_fu) to send a 4GB file to Amazon S3. VPSes or systems with much less memory might have issues backing up such huge files, though.

See the README for cronjob examples.

Pluginizing this code (which existed for a while in my apps, though not as a plugin) was inspired by Scott Patten who has written a similar kind of plugin.

Backup_fu does not erase old backup archives for you. This is left as an exercise for the reader. :) But seriously, it’s probably a good idea to check on your backups every month or so and do some pruning then, by hand if necessary.

How to Install ImageMagick from Source on OS X

Posted on January 31, 2008

First grab the source:

wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick.tar.gz

Unarchive it:

tar xvzf ImageMagick.tar.gz

The old ./configure / make / sudo make install ritual:

cd ImageMagick-6.3.8
# Or whichever the current version is, of course.
./configure
make
sudo make install

You should be good to go. Lately I’ve been having luck with MiniMagick (all I need to do is crop/resize for this particular project).

Type this to make sure you can use ImageMagick from the command line at least:

convert -version

I love that ruby (and many scripting languages) make it so easily to “shell out” to scripts (as minimagick does). It really makes ruby performance alarmists look bad when shelling out to time-tested, battle-hardened C-based scripts is so easy, and works so well. (I’ve had success shelling out to the following in many apps: curl, imagemagick, wget, etc)

How to Use MiniMagick in your Rails App

Grab the gem:

sudo gem install mini_magick

Drop this in your config/environment.rb:

require 'rubygems'
gem 'mini_magick'
require 'mini_magick'

Example usage:

class Pic < ActiveRecord::Base

  # Where size is a string like '90x90', '300x200', etc
  def create_perfect_thumbnail(size)

    image = MiniMagick::Image.from_file(self.pic_path)
    height, width = image['height'].to_f, image['width'].to_f

    # FIRST SHAVE off some of the image to make it square
    if width < height
      shave = ((height - width)/2).round
      image.shave("0x#{shave}")
    else
      shave = ((width - height)/2).round
      image.shave("#{shave}x0")
    end

    image.resize(size)
    image.write(self.pic_path(size))

    # I had issues on my linux box with the pic not being readable by the web server,
    #   following the resize.  Set permissions o+r to fix this.    
    if RAILS_ENV == 'production' # Set permissions to o+r
      `chmod o+r #{self.pic_path(size)}`
    end

  end

  def pic_path(size)
    # Just an example -- I normally group pics by user_id under a public static dir.
    File.join(RAILS_ROOT, 'public', 'static', "#{size}_#{self.original_filename}") 
  end

end

That will shave off some of the pic, making a munged square from the original, before proceeding to make square thumbnails from that.

Ruby gets a Reddit

Posted on January 27, 2008

Reddit and programming.reddit.com are two of my favorite social news sites.

Now, courtesy of James Golick – they just opened ruby.reddit.com – sweet!

VPSLink Outage + Some Raw Server Benchmarks

Posted on January 27, 2008

Back up after a brief (20+ hour) outage which was triggered by upgrading the VPS where this blog is hosted. More details here for the curious.

After coming back online at full CPU availability, I ran this benchmark script on the Link 6 vps:

==============================================================
BYTE UNIX Benchmarks (Version 4.1-wht.1)
System -- Linux videolockr.com 2.6.18-ovz028stab039.1-smp #1 SMP Tue Jul 24 12:12:48 MSD 2007 i686 i686 i386 GNU/Linux
/dev/simfs            79691776   4390104  75301672   6% /

Start Benchmark Run: Sun Jan 27 10:21:44 PST 2008
 10:21:44 up 16:57,  1 user,  load average: 0.29, 0.06, 0.04

End Benchmark Run: Sun Jan 27 10:40:59 PST 2008
 10:40:59 up 17:16,  1 user,  load average: 16.21, 6.57, 3.22

                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        376783.7  4076800.9      108.2
Double-Precision Whetstone                      83.1      949.2      114.2
Execl Throughput                               188.3     1613.2       85.7
File Copy 1024 bufsize 2000 maxblocks         2672.0   129719.0      485.5
File Copy 256 bufsize 500 maxblocks           1077.0    30195.0      280.4
File Read 4096 bufsize 8000 maxblocks        15382.0   490094.0      318.6
Pipe Throughput                             111814.6   688167.0       61.5
Pipe-based Context Switching                 15448.6   108170.3       70.0
Process Creation                               569.3     7265.2      127.6
Shell Scripts (8 concurrent)                    44.8      209.5       46.8
System Call Overhead                        114433.5   443236.3       38.7
                                                                 =========
     FINAL SCORE                                                     114.8

The VPS comes in at $129.95, which of course, one could have their own dedicated server for that amount. VPSes are truly the crack rock of the hosting industry—the first few hits are cheap, but the addiction grows progressively more expensive as time passes. :)

What I like about VPSes: a new one can usually be provisioned within as little as 2 hours. Also, a while back I had a string of hardware failures on dedicated boxes. So far my VPSes are batting 1,000 when it comes to hardware success.

For comparison, here is a benchmark from a HiVelocity Small Business Server (both are Fedora Core, btw) which costs $83 / month.

==============================================================
BYTE UNIX Benchmarks (Version 4.1-wht.1)
System -- Linux server.bradley.org 2.6.19-1.2288.fc5 #1 Sat Feb 10 14:52:17 EST 2007 i686 i686 i386 GNU/Linux
/dev/hda6              2030736    209076   1716840  11% /

Start Benchmark Run: Sun Mar 18 18:09:05 EDT 2007
 18:09:05 up  5:05,  1 user,  load average: 0.15, 0.03, 0.01

End Benchmark Run: Sun Mar 18 18:20:39 EDT 2007
 18:20:39 up  5:16,  1 user,  load average: 13.30, 5.84, 2.69

                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        376783.7  3266804.6       86.7
Double-Precision Whetstone                      83.1      757.4       91.1
Execl Throughput                               188.3      999.2       53.1
File Copy 1024 bufsize 2000 maxblocks         2672.0    53928.0      201.8
File Copy 256 bufsize 500 maxblocks           1077.0    17294.0      160.6
File Read 4096 bufsize 8000 maxblocks        15382.0   248599.0      161.6
Pipe-based Context Switching                 15448.6    87610.7       56.7
Pipe Throughput                             111814.6   400521.9       35.8
Process Creation                               569.3     4119.4       72.4
Shell Scripts (8 concurrent)                    44.8      210.6       47.0
System Call Overhead                        114433.5   495582.9       43.3
                                                                 =========
     FINAL SCORE                                                      78.1

The server I believe is comparable to something like this which is now $99 / mo.

Monitor MySQL with God on Your Side

Posted on January 26, 2008

god is a great ruby-based alternative to monit and other process monitoring tools. While I have had success with monit, I found its configuration syntax tedious. Monit’s configuration syntax makes it painful to do things like decrease/increase the number of mongrels you are monitoring, etc.

Note: the following assumes we’re on some kind of Fedora Core (or similar) system. Replace ”/etc/init.d/mysql stop|start|restart” with however this is done in your distro.

Configuration is where god shines. Let’s get started:
sudo gem install god

In this tutorial, I will only be covering how to monitor your MySQL process using god. The main god website has an excellent tutorial on how to monitor your mongrels.

God works by monitoring pid files. It has other functionality as well, but for MySQL 5.x monitoring, all we need is the location of the MySQL PID file. You can find this on your system with:

locate .pid | grep mysql

On my system, the MySQL pid file was located at:

/var/run/mysqld/mysqld.pid

Next we need to know how to stop/start/restart MySQL. On most systems, this can simply be done with:

cd /etc/init.d
sudo ./mysqld stop|start|restart

It’s a good idea to make sure these commands work by hand, of course, before assuming god should use them to manage MySQL.

Assuming your information matches the above, the following god config file should do the trick:

# God config file.
#
# Documentation: http://god.rubyforge.org/
#
# run with:  god -c /root/monitor.god
#

God.watch do |w|
  w.name = 'mysql-process'
  w.group = 'mysql'
  w.interval = 30.seconds # default      
  w.start = "cd /etc/init.d && ./mysqld start" 
  w.stop = "cd /etc/init.d && ./mysqld start" 
  w.restart = "cd /etc/init.d && ./mysqld restart" 
  w.start_grace = 10.seconds
  w.restart_grace = 10.seconds
  w.pid_file = '/var/run/mysqld/mysqld.pid'
  w.behavior(:clean_pid_file)

  w.start_if do |start|
    start.condition(:process_running) do |c|
      c.interval = 5.seconds
      c.running = false
    end
  end

  # lifecycle
  w.lifecycle do |on|
    on.condition(:flapping) do |c|
      c.to_state = [:start, :restart]
      c.times = 5
      c.within = 5.minute
      c.transition = :unmonitored
      c.retry_in = 10.minutes
      c.retry_times = 5
      c.retry_within = 2.hours
    end
  end
end

Place this into a file located at ’/root/monitor.god’. (for the below examples to work)

In order to test god, kick it into non-daemonized mode with:

sudo god -c /root/monitor.god -D

You should see some output like:

I, [2008-01-26 00:30:05 #1841]  INFO -- : Started on drbunix:///tmp/god.17165.sock
I, [2008-01-26 00:30:05 #1841]  INFO -- : mysql-process move 'unmonitored' to 'up'
I, [2008-01-26 00:30:06 #1841]  INFO -- : mysql-process [ok] process is running (ProcessRunning)

In another terminal, stop MySQL by hand:

cd /etc/init.d
sudo ./mysqld stop

This may not replicate exactly what happens when MySQL goes down in the wild, but at least you can test god’s basic functionality.

You should see the command line output of god indicate that it is restarting MySQL:

I, [2008-01-26 00:46:01 #18173]  INFO -- : mysql-process [trigger] mysql-process God::Conditions::ProcessRunning: no such pid file: /var/run/mysqld/mysqld.pid (ProcessRunning)
I, [2008-01-26 00:46:01 #18173]  INFO -- : mysql-process move 'up' to 'start'
I, [2008-01-26 00:46:01 #18173]  INFO -- : mysql-process before_start: no pid file to delete (CleanPidFile)
I, [2008-01-26 00:46:01 #18173]  INFO -- : mysql-process start: cd /etc/init.d && ./mysqld start
I, [2008-01-26 00:46:12 #18173]  INFO -- : mysql-process [ok] process is running (ProcessRunning)

Next up, you can daemonize god with:

sudo god -c /root/monitor.god

Now check if MySQL is up:

sudo god status mysql

Tail the god’s MySQL status log with:

sudo god log mysql
To add this as a @reboot cronjob, so that god always starts on reboot:
# First su to root:
su
# Edit root's crontab:
crontab -e
Then paste this entry into root’s crontab and save:
@reboot /usr/bin/god -c /root/monitor.god

Of course, execute ‘ps aux | grep god’ to do a sanity check that the above is the same on your system as well. (while god is running)

BTW – I get this warning when starting god, but everything still appears to work fine for me:

***********************************************************************
*
* Event conditions are not available for your installation of god.
* You may still use and write custom conditions using the poll system
*
***********************************************************************

My next question: what program does one use to monitor god itself?

def send Considered Harmful (FAILSAFE error)

Posted on January 19, 2008
If your rails app mysteriously breaks just after adding some new functionality, with an error message like:
Processing MessagingController#inbox (for 192.168.1.100 at 2008-01-19 11:02:45) [GET]
  Session ID: 17db90c03c03ebfee0c3ae4198900706
  Parameters: {"action"=>"inbox", "controller"=>"messaging"}
/!\ FAILSAFE /!\  Sat Jan 19 11:02:45 MST 2008
  Status: 500 Internal Server Error
  wrong number of arguments (1 for 0)

Chances are you just added a method “send” or similar to your rails app:

class MessagingController < AuthenticatedController
  def send
  end
end

This would, of course, override ruby’s default “send” behavior (an integral part of ruby, and rails).

Hat tip, Justin Ball – thx!

How to Add MySQL Full Text Indexes in Rails

Posted on January 10, 2008

Rails by default uses the InnoDB engine, but MySQL full-text indexing is only supported by the MyISAM table type.

So, first we’ll have to convert our target table over to MyISAM. Next we add the full-text index on several columns.
class FullTextSearch < ActiveRecord::Migration
  def self.up
    execute 'ALTER TABLE torrents ENGINE = MyISAM'
    execute 'CREATE FULLTEXT INDEX ft_idx_torrents ON torrents(name,filename,description)'
  end

  def self.down
    execute 'ALTER TABLE torrents DROP INDEX ft_idx_torrents'
    execute 'ALTER TABLE torrents ENGINE = InnoDB'
  end
end

This is the actual migration used in The Hydra Project to add full-text searching support of torrents.

How to Convert MySQL Tables to MyISAM or InnoDB

Posted on January 10, 2008

If you have a InnoDB table that you’d like to convert to MyISAM:

class ConvertToMyIsam < ActiveRecord::Migration
  def self.up
    execute 'ALTER TABLE torrents ENGINE = MyISAM'
  end

  def self.down
    execute 'ALTER TABLE torrents ENGINE = InnoDB'
  end
end

You can flip the migrations for the reverse, of course.