Null Disquisition

Python, AWS, Grad School, and your face

Archive for March, 2009

Time Machine In Your Pocket – Part 2

with one comment

After a little tinkering here, a little tinkering there, I’ve finally settled on a good solution for my portable backup drive (8GB usb thumb drive). As outlined in my previous post, I wanted a portable backup solution that could do incremental backups (like Apple’s TimeMachine does). I looked, of course, to the wonderful unix utility rsync. Here’s my latest version.


#!/bin/bash -x

DEST="/Volumes/PNY8GB/Backups"
LATEST="Latest"
EXCLUDES_FILE="$HOME/.rsyncexcludes"
FILES_FROM="$HOME/.rsyncfiles"
RSYNC="/usr/bin/rsync --max-size 10m"

# Make sure user is root
if (( `id -u` != 0 )); then
    { echo "Sorry, must be root. Exiting..."; exit; }
fi;

# Make sure backup device is attached
! test -d "$DEST" && echo "Please mount the backup drive!" && exit

# Run rsync
DATE=`date "+%Y-%m-%d-%H%M%S"`
n=`$RSYNC -r -a -x -S -R --stats --delete --link-dest=$DEST/$LATEST \
 --exclude-from $EXCLUDES_FILE --files-from $FILES_FROM $* $HOME \
 $DEST/$DATE | sed -n 's/Number of files transferred: \([^0]\)/\1/p'`

# Update 'Latest' link
rm $DEST/$LATEST
ln -s $DEST/$DATE $DEST/$LATEST

# Send a growl notification
if [ $n ]
then
    /usr/local/bin/growlnotify -m 'rsync complete,
number of files: '$n
fi


By using ––exlude-from and ––files-from, you get more fine grained control of what gets backed up. My Code folder is ~1GB, and my School folder is about 3GB. When I exclude all of my compiled code, data files, images, .git and .svn folders, and other various annoying swap files my base backup footprint is less than 500MB (for both Code and School).

Here’s my excludes file – it’s just one line per exclude filter

*.sql
*.bak
*.swp
.svn
*.pyc
*.log
*.tar.gz
*.dvi
*.o
*.out
*.d
*.tmp
.git


Similarly, the files-from file is one file path per line (remember the trailing slash!). An important note (found in the rsync manual) is that when you specify ––files-from, -r is no longer implied with -a. So make sure to add -r to your argument list.

And yet again, I leave the scheduling to you.

-David

Written by david

March 28th, 2009 at 4:46 pm

Posted in Mac

Tagged with , , ,

Computation even grad students can afford

without comments

This morning I wanted to mess around with some MPI code (Message Passing Interface). We have an high performance server farm here at FSU (aptly named the HPC), and I know I could have used that to test some MPI code – but using provided facilities is so boring. Plus, I wanted to try installing MrBayes in multi processor mode and distribute it over a few machines. I don’t think the admins at the HPC would like me doing that (if it’s even possible).

Show Accidental Tutorial


mv pk-somereallylonghash.pem ~/.ec2
mv cert-somereallylonghash.pem ~/.ec2
chmod 600 ~/.ec2/*.pem

It’s good practice to store things like keys and certificates in 600. In fact, ssh won’t let you use a private key if it’s not 600. Next, head back to amazon and get your account Id and super sercret password. They will look something like:

SOM3ALLC4P5L3TTERSANDNUMBERS
Som3mixeD(4sl3tteR$||um8er$and$ymb0ls

Now, I wouldn’t recommend saving these on your hard disk. I keep them in an encypted email. It would also be a good idea to print them out and keep them somewhere (it’s a pain in the ass to get a new password and go and change everything). At this point we have everything we need to spawn instances, bundle a new image, store things on S3, deliver content through the CloudFront (which is awesome btw), and anything else AWS has to offer. In order to get your spawn on, we need a tool of some kind (because no one wants to use the stupid command line tools). Elasticfox has pretty much taken it’s place as the singular tool for interfacing with EC2 services. It is incredibly useful – good design, open source, has constant updates, and they have a very fast turnaround time between Amazon releasing a new feature and Elasticfox having support for it. I think Amazon has commit several developers to it, but that’s just a guess. One more annoying step before we can launch an instance and log in. We need a keypair. Elasticfox is nice enough to do this for you. Simply click on the “Keypairs” tab and generate a new keypair. Name it something meaningful, not “keypair” – maybe something like “yourname-laptop” or “yourname-work”. You can of course share a keypair across multiple computers, but you’re on your own for that. Elasticfox will prompt you to download the keypair. Obey the machine – download it and save it to your ssh folder  (usually ~/.ssh).

mv david-laptop ~/.ssh
chmod 600 ~/.ssh/david-laptop

Alright! Now we can get to it! Go to the “Images” tab in Elasticfox and pick your favorite linux distro (alestic has a nice Debian 5.0 base image: ami-67fe190e). Click the green power button to launch, make sure it has your keypair selected, and select “ok”. A few seconds later, the instance should appear as “available” in your list of instances under the “Instances” tab. Once it is ready, double-click on it and copy the public dns name, then open up a terminal and ssh into it (using our keypair of course)

ssh root@ec2-123-456-77-22.compute-1.amazonaws.com -i ~/.ssh/david-laptop

Replacing my keypair name with yours, of course. That’s it, you should be connected.

Sorry about that. I didn’t mean for this to turn into a tutorial. Stupid blog. Back to my story.

After I got the AWS stuff set up, I spawned 4 base debian 5.0 instances and set them up with MPI and installed the multi processor version of MrBayes. After a little tinkering with MPICH settings (I will use OpenMPI next time), I got it running: 4 processors spread across 4 instances. So hot. I think I’ll be changing my thesis to a project where I can use AWS. It’s just so sexy.

Oh, and it cost me $0.12

Written by david

March 26th, 2009 at 12:30 pm