Null Disquisition

Python, AWS, Grad School, and your face

Archive for the ‘Amazon Web Services’ tag

MPI running on Amazon EC2

without comments

Amazon Web Services

For my Master’s thesis, I’m going to be running a lot of MPI code, and naturally I need a place to run it. Let me first say that my university has an excellent high-performance computing center run by one of my committee chairs that is more than capable of serving my needs – but yet, I am unfulfilled. With our scheduling system, there is a “backfill” that is always available for running small jobs (like the ones I run), but for my thesis, I want to test the massive scalability of an algorithm (Replica Exchange). When I mean massive, I mean massive – think 1000 compute nodes or more.

Big ideas, people.

In order to satisfy my need for a massively parallel platform, I looked no further than Amazon EC2. As should be apparent from many of my previous posts, I have been doing a lot of work with Amazon’s cloud services – both school and work.

A few weeks ago, I started an MIT-licensed open source project on GitHub aptly named EC2MPI. Today I made a major step forward with this project which was the motivation for this post. I finally have everything configured properly and got my first no-hassle MPI cluster up and running.

The script I wrote (EC2MPI), is written in Python and presents an interactive prompt to the user. You select the architecture (i386 or x64), the number of instances, and I also have support for user-defined SSH keypairs (not AWS keypairs) for cluster security. The instances are spawned, and EC2MPI sets up the SSH keys, as well as MPI configuration. It is so freaking sweet.

I wanted to share some issues I’ve had so far while developing this and how I solved them.

Intra-EC2 communication – For this, I needed each instance to be able to talk to one another for point-to-point as well as collective communication. My solution for this was to allow the user to generate SSH keypairs which were stored in a private S3 bucket (owned by the user). My user-data script sent to the instances took care of downloading and installing the keys upon startup.

Shared storage among instances – In order to run MPI code, the nodes in the cluster need access to a shared storage volume which will contain binary files compiled by MPI. Since EC2 has no shared storage (for now), I had to find an alternate solution. The solution I settled on was to use s3fs: a fuse-based filesystem which allows you to mount an S3 bucket as a volume. Reading and writing to the shared volume is pretty slow (unless it’s cached), so for certain kinds of code this might not be ideal. However, I believe it is the best solution for now. I imagine one day Amazon will add a feature to the Elastic Block Storage volumes that allow them to act as shared volumes.

Starting up and tearing down clusters – I used Amazon SimpleDB to keep meta-data about the cluster: how many instances are in the cluster, internal/external IP addresses, etc. This is also how I define the master node and worker nodes. This will allow me to add features such as adding and removing instances from a cluster without having to tear the whole thing down. Also I did all startup config with a user-data script so the script does not have to log into each instance upon startup. This allows the clusters startup to scale well.

Check back soon for some benchmarks and more detailed write-ups as the project progresses. First, I need to get my maximum number of instances increased (right now I can do 20 max). Fast times ahead, friends.

-David

Written by david

May 30th, 2009 at 7:40 pm

Managing multiple AWS accounts

without comments

ec2-account.jpg On my personal computer, I have three sets of x509 certificates/private keys. This makes using the EC2-API-tools quite the hassle. Echoes of EC2_CERT and EC2_PRIVATE_KEY haunt my dreams.

So, like you do with these sort of things, I wrote a bash script to work some magic.

    #!/bin/bash
    echo "Choose Account:"
    read account
    base=grep $account ~/.ec2/README -i | awk '{print $1}'
    if [ ! -n "$base" ]; then
        echo "Sorry, that account does not exist"
        return
    fi
    declare -x EC2_CERT="~/.ec2/cert-$base.pem"
    declare -x EC2_PRIVATE_KEY="~/.ec2/pk-$base.pem"
    echo "EC2 environment updated"

Requires that you your private keys/certs in ~/.ec2, and they are named cert-{something}.pem and pk-{something}.pem. Also, you need a README file in ~/.ec2 that looks like

    something account1
    something-else account2

I setup an alias so I just run “ec2-account personal” to switch to my personal credentials, and “ec2-account work” to switch to my work account.

-David

Written by david

May 11th, 2009 at 10:41 pm

Funded!

without comments

monopoly-money1.jpgAmazon issued me 300 dollars in EC2 credits to support Master’s project. Very exciting.

If you’re a university researcher, student, or professor, visit http://aws.amazon.com/education for more information. One of my professors talked to me about giving a seminar on cloud computing in the fall. I believe these types of grants are issued for that sort of thing as well.

Totally putting this on my CV.

Written by david

May 5th, 2009 at 8:00 pm

Serve gzipped content from Amazon S3

with 3 comments

gzipper

Set the “Content-encoding” header to “gzip”. Really, it’s that easy.

Kthxbye.

Well, since you came all this way, I’ll give a little more detail. First, make a file.

Now gzip it.

Upload it.

Find a utility that can modify file headers on S3: S3Hub (OS X), Cloudberry S3 Explorer (Windows), or any of the various 3rd party libraries.

Set the Content-type header to whatever the appropriate content type is: text/plain, text/css, text/javascript, image/jpeg, etc.

Set the Content-encoding to gzip.

Pat yourself on the back.

Here’s three versions of a text file I made and gzipped. Note that with appropriate headers, file extensions don’t mean squat.

  1. http://mumrah-dot-net.s3.amazonaws.com/gziptest.txt.gz
  2. http://mumrah-dot-net.s3.amazonaws.com/gziptest.txt
  3. http://mumrah-dot-net.s3.amazonaws.com/gziptest

Go ahead and download one – you’ll see that the file is actually gzipped and your browser is doing the deflating on the fly. This is the same effect producted by mod_deflate in Apache.

-David

Written by david

May 5th, 2009 at 12:15 am

Updates, Upgrades, and Migrates

without comments

New Server, new WordPress install. I must say, the export/import feature in WordPress is very slick. I’ve been using it since well before v1.0, and it has come a long way.

The motivation for the upgrade came with a server migration I’m in the middle of. I’m in the process of starting up a consulting company for Amazon Web Services, and decided it would be rather obscene if I at least didn’t host my blog on EC2. So here we are – in the cloud. It’s kinda cold, and wet.

A web server on EC2, you ask. But what about the htdocs, and virt-host files? We need persistent storage! I created two EBS volumes (both formatted to XFS): one for MySQL data stores and Apache config, and another for /home. I decided to put all of the htdocs in /home (along with user’s public_html) instead of the traditional /var/www. It was easier than creating a volume for /var as well.

look how sad

So we have a full LAMP stack running on a small EC2 instance, costing us the same as our machine at ServerBeach. The main difference being we now have a development environment within AWS making things much easier to test and deploy.

Here’s a sad-face icon I made for my growl-notification when/if my instance goes down

-David

Written by david

April 23rd, 2009 at 12:33 pm

Computation even grad students can afford

without comments

This morning I wanted to mess around with some MPI code (Message Passing Interface). We have an high performance server farm here at FSU (aptly named the HPC), and I know I could have used that to test some MPI code – but using provided facilities is so boring. Plus, I wanted to try installing MrBayes in multi processor mode and distribute it over a few machines. I don’t think the admins at the HPC would like me doing that (if it’s even possible).

Show Accidental Tutorial


mv pk-somereallylonghash.pem ~/.ec2
mv cert-somereallylonghash.pem ~/.ec2
chmod 600 ~/.ec2/*.pem

It’s good practice to store things like keys and certificates in 600. In fact, ssh won’t let you use a private key if it’s not 600. Next, head back to amazon and get your account Id and super sercret password. They will look something like:

SOM3ALLC4P5L3TTERSANDNUMBERS
Som3mixeD(4sl3tteR$||um8er$and$ymb0ls

Now, I wouldn’t recommend saving these on your hard disk. I keep them in an encypted email. It would also be a good idea to print them out and keep them somewhere (it’s a pain in the ass to get a new password and go and change everything). At this point we have everything we need to spawn instances, bundle a new image, store things on S3, deliver content through the CloudFront (which is awesome btw), and anything else AWS has to offer. In order to get your spawn on, we need a tool of some kind (because no one wants to use the stupid command line tools). Elasticfox has pretty much taken it’s place as the singular tool for interfacing with EC2 services. It is incredibly useful – good design, open source, has constant updates, and they have a very fast turnaround time between Amazon releasing a new feature and Elasticfox having support for it. I think Amazon has commit several developers to it, but that’s just a guess. One more annoying step before we can launch an instance and log in. We need a keypair. Elasticfox is nice enough to do this for you. Simply click on the “Keypairs” tab and generate a new keypair. Name it something meaningful, not “keypair” – maybe something like “yourname-laptop” or “yourname-work”. You can of course share a keypair across multiple computers, but you’re on your own for that. Elasticfox will prompt you to download the keypair. Obey the machine – download it and save it to your ssh folder  (usually ~/.ssh).

mv david-laptop ~/.ssh
chmod 600 ~/.ssh/david-laptop

Alright! Now we can get to it! Go to the “Images” tab in Elasticfox and pick your favorite linux distro (alestic has a nice Debian 5.0 base image: ami-67fe190e). Click the green power button to launch, make sure it has your keypair selected, and select “ok”. A few seconds later, the instance should appear as “available” in your list of instances under the “Instances” tab. Once it is ready, double-click on it and copy the public dns name, then open up a terminal and ssh into it (using our keypair of course)

ssh root@ec2-123-456-77-22.compute-1.amazonaws.com -i ~/.ssh/david-laptop

Replacing my keypair name with yours, of course. That’s it, you should be connected.

Sorry about that. I didn’t mean for this to turn into a tutorial. Stupid blog. Back to my story.

After I got the AWS stuff set up, I spawned 4 base debian 5.0 instances and set them up with MPI and installed the multi processor version of MrBayes. After a little tinkering with MPICH settings (I will use OpenMPI next time), I got it running: 4 processors spread across 4 instances. So hot. I think I’ll be changing my thesis to a project where I can use AWS. It’s just so sexy.

Oh, and it cost me $0.12

Written by david

March 26th, 2009 at 12:30 pm