Null Disquisition

Python, AWS, Grad School, and your face

Archive for the ‘School’ Category

First (real) MPI run on EC2

without comments

After a few days of tinkering with EC2MPI, I spent some time polishing up a stat mech MPI simulation. The code in question is a 2d Ising model simulation using Replica Exchange. Right now it stands at around 400 lines of C++ using STL vectors (which I love). Once I know it works (or at least works well enough) I might post it up here, but for now I’m just trying to generate pretty hysteresis plots and observe the critical behavior of a 2d Ising model system. Here’s a picture with points on it.

EvM.png

Energy per spin plotted against magnetization

I leave the interpretation to you. The best part of this is that I can do these MPI runs without burning a hole in my lap (the MacBook gets rather warm). -David

Written by david

June 8th, 2009 at 9:22 pm

Posted in Amazon Web Services, School

Tagged with , ,

MPI running on Amazon EC2

without comments

Amazon Web Services

For my Master’s thesis, I’m going to be running a lot of MPI code, and naturally I need a place to run it. Let me first say that my university has an excellent high-performance computing center run by one of my committee chairs that is more than capable of serving my needs – but yet, I am unfulfilled. With our scheduling system, there is a “backfill” that is always available for running small jobs (like the ones I run), but for my thesis, I want to test the massive scalability of an algorithm (Replica Exchange). When I mean massive, I mean massive – think 1000 compute nodes or more.

Big ideas, people.

In order to satisfy my need for a massively parallel platform, I looked no further than Amazon EC2. As should be apparent from many of my previous posts, I have been doing a lot of work with Amazon’s cloud services – both school and work.

A few weeks ago, I started an MIT-licensed open source project on GitHub aptly named EC2MPI. Today I made a major step forward with this project which was the motivation for this post. I finally have everything configured properly and got my first no-hassle MPI cluster up and running.

The script I wrote (EC2MPI), is written in Python and presents an interactive prompt to the user. You select the architecture (i386 or x64), the number of instances, and I also have support for user-defined SSH keypairs (not AWS keypairs) for cluster security. The instances are spawned, and EC2MPI sets up the SSH keys, as well as MPI configuration. It is so freaking sweet.

I wanted to share some issues I’ve had so far while developing this and how I solved them.

Intra-EC2 communication – For this, I needed each instance to be able to talk to one another for point-to-point as well as collective communication. My solution for this was to allow the user to generate SSH keypairs which were stored in a private S3 bucket (owned by the user). My user-data script sent to the instances took care of downloading and installing the keys upon startup.

Shared storage among instances – In order to run MPI code, the nodes in the cluster need access to a shared storage volume which will contain binary files compiled by MPI. Since EC2 has no shared storage (for now), I had to find an alternate solution. The solution I settled on was to use s3fs: a fuse-based filesystem which allows you to mount an S3 bucket as a volume. Reading and writing to the shared volume is pretty slow (unless it’s cached), so for certain kinds of code this might not be ideal. However, I believe it is the best solution for now. I imagine one day Amazon will add a feature to the Elastic Block Storage volumes that allow them to act as shared volumes.

Starting up and tearing down clusters – I used Amazon SimpleDB to keep meta-data about the cluster: how many instances are in the cluster, internal/external IP addresses, etc. This is also how I define the master node and worker nodes. This will allow me to add features such as adding and removing instances from a cluster without having to tear the whole thing down. Also I did all startup config with a user-data script so the script does not have to log into each instance upon startup. This allows the clusters startup to scale well.

Check back soon for some benchmarks and more detailed write-ups as the project progresses. First, I need to get my maximum number of instances increased (right now I can do 20 max). Fast times ahead, friends.

-David

Written by david

May 30th, 2009 at 7:40 pm

Funded!

without comments

monopoly-money1.jpgAmazon issued me 300 dollars in EC2 credits to support Master’s project. Very exciting.

If you’re a university researcher, student, or professor, visit http://aws.amazon.com/education for more information. One of my professors talked to me about giving a seminar on cloud computing in the fall. I believe these types of grants are issued for that sort of thing as well.

Totally putting this on my CV.

Written by david

May 5th, 2009 at 8:00 pm

Getting to it

without comments

Dis·qui·si·tion, n. – A formal discourse on a subject, often in writing.

Started seriously getting the ball rolling on my thesis this week, outlines and everything. I found a really great app for writing called Scrivener (non free, OS X only). Notice that I said writing, not publishing. For my purposes, it does rather poorly as a publishing platform, but I have that end of things worked out rather well (LaTeX represent!). 1000 words in the first day. Granted, they are the easy words (background and lit review), but hopefully I can keep up a moderate pace so I can defend this summer. Once I start finishing sections and moving my draft into LaTeX, I’ll start publishing them somewhere here. Probably not in this blog, but maybe a directory for a Latex2HTML dump.

In other news, we got a Nikon d80. All the recent pics in my Flickr stream are taken with it (minimal post-processing).

It’s nice to be writing again.

Written by david

February 19th, 2009 at 2:30 pm

Posted in School, php

Tagged with

Pretty pictures

without comments

Just felt like sharing, b/c the pictures are so pretty. Here are some movies I made for an assignment in my biophysics class. It’s a bunch of particles in a box (as you might have guessed). It represents roughly the behavior of a Nobel gas in a confined space. My prof provided the template for the code, but we had to implement the hard parts.

Higher temperature

Lower temperature

Neat!

Written by david

January 28th, 2009 at 4:14 pm

Posted in School

Tagged with ,

The Official FSU Re-fill cup

without comments

Busy times. Been busting ass to keep up with everything this semester – teaching assistantship, classes, thesis, not to mention my new(ish) job at Loud3r. Had a few interesting things happen latetly that I felt warranted an update. To enumerate: I hacked some kid’s Facebook account, wrote an HTML scraper to get the latest Naruto Shippüden episodes from Dattebayo, got my code working for my thesis, and did my final edits on my first real academic paper.

Facebook failure:

I was sitting in the class I TA for (along with my cohort Billzebub), I was sniffing the wifi traffic (like you do) and took a look at the pcap dump that was captured. Amongst the garbage was some request/response headers for Facebook. Being the curious little monkey I am, I fire up Firefox and copy/paste all the cookie information into my session (using Web Developer 2 extension). I head over to facebook.com and low and behold, I am Matt Whatshisname. Full access too, not just a temporary hiccup in the login system. After resisting messing with stuff and/or snooping, I clear my cookies and sit back in awe. Awe at how ridiculous it is that the Facebook login system is so exposed and broken.

I tried to replicate the cookie spoof for some pics for this post, but apperently one or more of the cookies are time-sensitive.

Edit: Hack successfully reproduced! Epic fail!!

http://skitch.com/mumrah/7gcg/his-name-is-robert-paulson

http://skitch.com/mumrah/7gcj/full-access

More Naruto Shenanigans:

No need to waste time about how I did it, here’s a link that pretty much explains it all. Naruto Shippüden XML feed.

Thesis:

After going back and forth with my professor for weeks not getting anywhere, I sit him down and start at the beginning and force him to work through all the details with me. 4 hours later we have some functioning code. Obligatory photos to follow. The above picture demonstrates the orthogonality and normalization of the eigenvectors (meaning we finally have the parition function correct as well as the normalization criterion). The following three pictures are just the first three eigenvectors.

Looking forward to the break.

-David

Written by david

December 1st, 2008 at 3:25 pm

Posted in General, School

Tagged with , ,

1d Fokker-Plank equation

with 2 comments

As promised, I bring pretty pictures. The past few days I’ve been working on a solution to the 1d diffusion equation with a drift term, better known as the Fokker-Planck equation.

The 1d Fokker-Planck equation

Sexy, I know. Anyhow, I finally worked out the Python code to get it rolling (literally!). The test system I did has periodic boundary conditions and an initial condition of a sharply-peaked Gaussian (a = 20). I’ll spare the details and jump to the fun part.

Here’s the Python code that made it happen (scipy and matplotlib required).

-David

Written by david

October 10th, 2008 at 4:45 am

Posted in School, python

Tagged with , , ,

Afternoon decafe

without comments

I can’t believe I’m actually doing work this afternoon instead of playing Spore. Oh well, I’m already up to Civilization Phase (only took 12 hours). The wife is at Starbucks off campus studying with her cohorts (with shitty T-Mobile wifi), so I went to the Starbucks on campus (with awesome free campus Wifi).

Earlier this week, I promised that I was going to keep everyone updated with how my research is going and what I’m doing. So, here we are.

As previously mentioned, this first paper I’m writing is a “Why and When” sort of paper. We talk about three popular methods for doing MCMC simulations and which is best under what circumstances. The three methods are traditional Molecular Dynamics, Multiensemble, and Hamiltonian Replica Exchange Molecular Dynamics. The method I’m researching is the last one, hREMD (love the title). You can tell it’s a recent method cause the name is really long and convoluted. I’m pretty sure all the good names in MCMC were taken by the mid 90s. I won’t go into full detail on each method here (trying not to lose anyone’s interest), just know they exist.

The basic jist of the paper follows. You can break down the computation resources of a simulation into two parts: equilibration phase, and production phase. When doing MCMC, you must let your system fully equilibrate before you can start sampling data. An example of why this is would be, in 2D suppose you stick a particle in a box and let the particle move around a tiny bit each iteration. For the next several iterations, the position of the particle is going to be correlated to the starting point. This is called configuration bias (or startup bias, et al.). The following figure is the Autocorrelation of some MC time series data (the thickness is from the error bars).

A visual analogy follows: Suppose you take a thatch of color, magenta. If you break it down into 3 color channels (red, green, blue) the corresponding hex code would be something around #A03. Call this our starting point.

Iteration 1 (#AA0033)

Now let’s make up an update rule for our Markov chain. Each iteration, we pick a color channel and shift it by some amount x where x is a random integer between [-1,1]. After 100 iterations, we have moved around in the 3d color-space (where each channel is a dimension), and have ended up at #953.

Iteration 100 (#995533)

Hmm. Not much has changed. Lets look at 10000 iterations.

Iteration 10000 (#222255)

Ok, that’s better. What I’m trying to demonstrate is that when you do a stochastic simulation like this, the starting point is going to bias what the system does for the first several iterations. You need to let the system run for a very long time in order for your current state to have no “history” of the first state. The plot above shows the correlation of the system as it goes through time. Notice, at the beginning, the correlation is very high (in fact at time 0 it is infinite). The reasoning for this is the same as for why Iteration 1 and Iteration 100 of our color simulation are very similar.

That said, my paper talks about how long it takes each of the 3 different methods to reach equilibrium – when the current state has lost all memory of the original state. I think I’ll make a little demo of the color thing.

-David

Written by david

September 7th, 2008 at 2:26 pm

Posted in School

Tagged with , ,

My first real paper

without comments

My prof is putting me down as primary author on a paper we’re working on. Or rather, I’m putting my professor down as a corresponding author on a paper I’m writing. heh. The topic is testing the efficieny/effectivness of Replica Exchange Molecular Dynamics to Multiensemeble methods (obligitory wiki links), and when it’s best to use each method. The funny thing about statistical mechanics (and a lot of science in general) is that the concepts are fundamentally simple, but the literature is so far obfuscated with jargon and assumptions that hardly anyone can understand them. I mean, shit, I hardly follow half of what I read – and now I’m supposed to be writing it.

My generation of grad student is coming from the first batch of kids who grew up with the internet, and really the first generation of Wikipedia. As such, I’m going to try a new type of research dogma that attempts to make my research available (and accessable) to anyone. This type of transparent research is become more common, and I hope to see more of it.

Here’s a quick run down of my goals for this experiment

  • Provide all of my publications and projects freely (source too)
  • Keep the language deflated, no jargon
  • Document my methods, keep the research process transparent
  • Contribute info (not necessarily new research) back into Wikipedia
  • Not get caught by my committee for giving away research ^_^

Hopefully, by the time I finish my thesis I will have enough content here for anyone (idiots excluded) to somewhat understand what it’s all about. Hope you’re all ready – hope I’m ready.

Edit: Loving my macbook.

Written by david

September 4th, 2008 at 9:33 pm

Posted in School

Tagged with ,

Thesis and First iMpressions

without comments

This semester is all about writing. My professor wants to get two papers out this fall. Luckily, however, these papers will be chapters 2 and 3 of my thesis. Huzzah.

I got a Macbook Pro last week (from an undisclosed source), so now I fit in with the other grad students. The adjustment to OS X has been relatively painless (coming from Ubuntu). My first inclination was to ditch OS X completely and load Linux, but I’ve been persuaded by the Apple Demons (evil and benine) to give Mac a chance. I’ve had to do a lot of customization to get the terminal anywhere near the functionality of gnome-terminal. In fact, I ditched the default terminal for a project call iTerm. Indeed, I will miss gnome-terminal.

The multi touch is incredible. The hardware is incredible.

I’ve quit my crappy job (see Access Nightmares), and taken an awesome job (see How Happy). This is going to be a busy few months. Here we go.

-David

Written by david

September 4th, 2008 at 2:17 pm

Posted in School

Tagged with , ,