Skip to content
jashkenas edited this page Sep 13, 2010 · 26 revisions

CloudCrowd is available as a Ruby Gem:

sudo gem install cloud-crowd

All of the CloudCrowd commands can be used through the crowd executable, which is installed alongside the gem. Try typing crowd --help to get a brief overview of the available options. For every machine that you’d like to make part of a CloudCrowd cluster, you’ll need to install a CloudCrowd configuration folder, which includes YAML configuration files for your workers and central database, and all of your custom actions. To get started, use crowd to unpack an example configuration folder:

crowd install path/to/config

Inside the configuration folder, you’ll find config.yml, config.ru, database.yml, and the actions folder. More information about all of these can be found in The Configuration Folder as well as by reading the files. At this point, you’ll want to edit config.yml and database.yml to add your AWS credentials, turn authentication on or off, and set up the backing database. CloudCrowd uses ActiveRecord on the central server, so any ActiveRecord compatible database will do nicely. Once your database has been created on your central server (which in this case is your laptop, if you’re just giving it a spin) and configured in database.yml, you can load in the schema:

crowd load_schema

With the database ready, let’s start a single instance of the central server. In production, you can use the config.ru rackup file to launch the desired number of server instances, and hide them behind a load balancer as you would any other Rack application.

crowd server

Visit the URL that you configured for the central server (localhost:9173 by default), and you should see the Operations Center showing an empty queue with no active jobs, and no running worker daemons. As you launch workers and start jobs, they’ll show up on the screen. Here’s what the Operations Center looks like in the middle of running a series of quick jobs:

Let’s start 5 worker daemons. CloudCrowd uses the Daemons gem internally to manage the workers, and the crowd executable responds to all the same commands as the Daemons gem (start, stop, run, status, restart).

crowd workers start -n 5

You should see the workers come online in the Operations Center. They’re identified by their process ID and the hostname of the server they’re running on, should you have the need to go investigate a particular worker. You can click on a worker to see the work unit that it’s currently processing, and the stage of computation (splitting, processing, merging) that it’s currently performing. Let’s try an example job that uses the built-in word_count action, which simply shells out to the UNIX wc utility. Here’s a link to a job that uses word_count to count all of the words in the most famous Shakespeare plays, in parallel. You can run the example as a vanilla Ruby script, but let’s load it inside of a CloudCrowd console, to keep an eye on things.

crowd console

irb(main)> load '~/Downloads/shakespeare_word_count.rb'
=> []

# At this point, looking at the Operations Center, you'll see the number of 
# work units in the queue spike. The next time they check in, the worker daemons
# will pick them up, and they should be rapidly dispatched.

irb(main)> CloudCrowd::Job.last.display_status
=> "succeeded"

irb(main)> CloudCrowd::Job.last.time_taken
=> 5.7215

irb(main)> JSON.parse CloudCrowd::Job.last.outputs
=> [532047]

# And there's the merged result for the total word count.

Clone this wiki locally