Python Persistence

tl;tr #

If we care about compatibility with different Python versions, use cPickle with binary protocol and fast option.

Otherwise just use marshal.

Performance #

A benchmark of marshal, pickle, json, capnp for python 2.7 on my machine.

import timeit

md = "marshal.dumps({'people': [{'name': 'Alice'}]})"
im = 'import marshal'

p d= "pickle.dumps({'people': [{'name': 'Alice'}]})"

ip = 'import cPickle as pickle'

jd = "json.dumps({'people': [{'name': 'Alice'}]})"
ij = 'import json'

cd = "bm_capnp.AddressBook.new_message(people=[{'name': 'Alice'}])"
ic = 'import capnp; import bm_capnp'

def benchmark(d, i):
    return timeit.timeit(d, i, number=10000)

# Benchmarking
m = benchmark(md, im)
p = benchmark(pd, ip)
j = benchmark(jd, ij)
c = benchmark(cd, ic)

# Output result
output = "{name}: {time}".format

output(name='marshal', time=m)
output(name='(c)pickle', time=p)
output(name='json', time=j)
output(name="Cap'n Proto", time=c)

Cap'n Proto requires an additional schema file:


struct Person {
  name @0 :Text;

struct AddressBook {
  people @0 :List(Person);


marshal: 0.08457612991333008
(c)pickle: 0.31447696685791016
json: 0.7639560699462891
Cap'n proto: 0.4193081855773926

So marshal seems the fastest on my machine.

However, with binary protocol and fast option, cPickle is mostly as fast as marshal.

>>> md = 'marshal.dumps([1, 2, 3])'
>>> im = 'import marshal'
>>> pd = 'fp.dump([1, 2, 3])'
>>> ip = "import cPickle; fp = cPickle.Pickler(open('/tmp/1', 'wb'), 2); = 1"
>>> timeit.timeit(md, im, number=1000)
>>> timeit.timeit(pd, ip, number=1000)

Compatibility #

Compatibility with different Python versions #

Marshal is for internal usage (.pyc). So its format may be modified on future versions of Python. Currently (up to Python 3.4.3), there are 4 versions of marshal. The current version can be viewed via marshal.version.

Compatibility with other languages #

Implemented in other languages:

Export to JSON #

To communicate with other programs, we can export to JSON.

On Python 2.7+, the json module from stdlib is fast in encoding and decoding JSON. No need to use other modules.