Null Disquisition

Python, AWS, Grad School, and your face

Archive for September, 2009

Making Python’s pickle safe(r)

without comments

Scared Pickles Everyone loves pickle, I mean, what’s not to love. Super fast object serialization (via cPickle). However, there are some legitimate concerns regarding the security of pickle – specifically the load/loads method. The basic problem is, if you try to unpickle untrusted data, you are liable to create some objects that can do nasty things (like make system calls). Python even gives us a nice warning right in the docs

Warning pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.


Now there are plenty of things you can do to improve the security of the unpickling process. Python lets you subclass pickle.Unpickler to give the user finer grained control over what gets unpickled. This is a fine approach (a nice example here), and will work for most, but I will give my take on the issue.

For most of the applications I write that use pickle, I’m just looking for a way to store arbitrary Python data as a string. One example might be storing small data objects on S3, or perhaps implementing user sessions for a webapp. Either way, I should be able to trust my own data for unpickling, but it’s always best to be double-extra-sure when dealing with something where you can blindly execute arbitrary bits of code (think, the evil eval method).

So, for my case, I simply want to verify that the pickled data I stored is coming back to me unmodified. My solution: sign the pickled data. Using the same signing method as AWS, I present the following:

import hmac
import hashlib
import base64
from cPickle import dumps
 # The unsigned pickled data
string_to_sign = dumps({'foo':"bar",'spam':"eggs",'the answer':42})
 # The signature object
signature = hmac.HMAC(key="my application's super secret key",
    msg= string_to_sign, digestmod=hashlib.sha256)
 # The signed string: store this
signed_string = string_to_sign + base64.encodestring(signature.digest())
Now you have your pickled data as the first part of the string with the last 45 characters being the signature. The key for HMAC signing is specific to your application, so if someone gets access to your pickled data and tries to mess with it and resign it, it won’t work. Here’s the unpickling process:
import hmac
import hashlib
import base64
from cPickle import loads
 # Break up the signed string into message and signature
signature = signed_string[-45:]
message = signed_string[:-45]
 # Calculate the signature of the message
msg_sig = hmac.HMAC(key="my application's super secret key",
    msg= message, digestmod=hashlib.sha256)
 # See that it matches the given signature
assert base64.encodestring(msg_sig.digest()) == signature
-David

Written by david

September 9th, 2009 at 12:24 pm

Posted in programming, python