Python Persistence
This survey is outdated and unmaintained. For compatibility with other programs, I would recommend JSON or SQLite instead.
tl;tr
If we care about compatibility with different Python versions,
use cPickle
with binary protocol and fast
option.
Otherwise, just use marshal.
Performance
A benchmark of marshal, pickle, json, capnp for python 2.7 on my machine.
import timeit
= "marshal.dumps({'people': [{'name': 'Alice'}]})"
md = 'import marshal'
im
= "pickle.dumps({'people': [{'name': 'Alice'}]})"
p d
= 'import cPickle as pickle'
ip
= "json.dumps({'people': [{'name': 'Alice'}]})"
jd = 'import json'
ij
= "bm_capnp.AddressBook.new_message(people=[{'name': 'Alice'}])"
cd = 'import capnp; import bm_capnp'
ic
def benchmark(d, i):
return timeit.timeit(d, i, number=10000)
# Benchmarking
= benchmark(md, im)
m = benchmark(pd, ip)
p = benchmark(jd, ij)
j = benchmark(cd, ic)
c
# Output result
= "{name}: {time}".format
output
name='marshal', time=m)
output(name='(c)pickle', time=p)
output(name='json', time=j)
output(name="Cap'n Proto", time=c)
output(
Cap'n Proto requires an additional schema file:
@0x934efea7f017fff0;
struct Person {
@0 :Text;
name
}
struct AddressBook {
@0 :List(Person);
people
}
Result:
marshal: 0.08457612991333008
(c)pickle: 0.31447696685791016
json: 0.7639560699462891
Cap'n proto: 0.4193081855773926
So marshal seems the fastest on my machine.
However, with binary protocol and fast
option,
cPickle
is mostly as fast as marshal
.
>>> md = 'marshal.dumps([1, 2, 3])'
>>> im = 'import marshal'
>>> pd = 'fp.dump([1, 2, 3])'
>>> ip = "import cPickle; fp = cPickle.Pickler(open('/tmp/1', 'wb'), 2); fp.fast = 1"
>>> timeit.timeit(md, im, number=1000)
0.012997150421142578
>>> timeit.timeit(pd, ip, number=1000)
0.017292022705078125
Compatibility
Compatibility with different Python versions
Marshal is for internal usage (.pyc
).
So its format may be modified on future versions of Python.
Currently (up to Python 3.4.3), there are 4 versions of marshal
.
The current version can be viewed via marshal.version
.
Compatibility with other languages
Implemented in other languages:
- Ruby: RMarshal
- Go: gopymarshal
- Perl: (read-only) https://github.com/gitpan/Python-Serialise-Marshal
Export to JSON
To communicate with other programs, we can export to JSON.
On Python 2.7+, the json
module from stdlib
is fast in encoding and decoding JSON.
No need to use other modules.