Based on snippet #513 by obeattie.
Update 10/10/09: Further development is now occurring on GitHub, thanks to Shrubbery Software.
Incredibly useful for storing just about anything in the database (provided it is Pickle-able, of course) when there isn't a 'proper' field for the job.
PickledObjectField
is database-agnostic, and should work with any database backend you can throw at it. You can pass in any Python object and it will automagically be converted behind the scenes. You never have to manually pickle or unpickle anything. Also works fine when querying; supports exact
, in
, and isnull
lookups. It should be noted, however, that calling QuerySet.values()
will only return the encoded data, not the original Python object.
Please note that this is supposed to be two files, one fields.py and one tests.py (if you don't care about the unit tests, just use fields.py).
This PickledObjectField has a few improvements over the one in snippet #513.
-
This one solves the
DjangoUnicodeDecodeError
problem when saving an object containing non-ASCII data by base64 encoding the pickled output stream. This ensures that all stored data is ASCII, eliminating the problem. -
PickledObjectField
will now optionally usezlib
to compress (and uncompress) pickled objects on the fly. This can be set per-field using the keyword argument "compress=True". For most items this is probably not worth the small performance penalty, but for Models with larger objects, it can be a real space saver. -
You can also now specify the pickle protocol per-field, using the protocol keyword argument. The default of
2
should always work, unless you are trying to access the data from outside of the Django ORM. -
Worked around a rare issue when using the
cPickle
and performing lookups of complex data types. In short,cPickle
would sometimes output different streams for the same object depending on how it was referenced. This of course could cause lookups for complex objects to fail, even when a matching object exists. See the docstrings and tests for more information. -
You can now use the
isnull
lookup and have it function as expected. A consequence of this is that by default,PickledObjectField
hasnull=True
set (you can of course passnull=False
if you want to change that). Ifnull=False
is set (the default for fields), then you wouldn't be able to store a PythonNone
value, sinceNone
values aren't pickled or encoded (this in turn is what makes theisnull
lookup possible). -
You can now pass in an object as the default argument for the field without it being converted to a unicode string first. If you pass in a callable though, the field will still call it. It will not try to pickle and encode it.
-
You can manually import
dbsafe_encode
anddbsafe_decode
from fields.py if you want to encode and decode objects yourself. This is mostly useful for decoding values returned from callingQuerySet.values()
, which are still encoded strings.
The tests have been updated to match the added features, but if you find any bugs, please post them in the comments. My goal is to make this an error-proof implementation.
Note: If you are trying to store other django models in the PickledObjectField
, please see the comments for a discussion on the problems associated with doing that. The easy solution is to put django models into a list or tuple before assigning them to the PickledObjectField
.
Update 9/2/09: Fixed the value_to_string
method so that serialization should now work as expected. Also added deepcopy
back into dbsafe_encode
, fixing #4 above, since deepcopy
had somehow managed to remove itself. This means that lookups should once again work as expected in all situations. Also made the field editable=False
by default (which I swear I already did once before!) since it is never a good idea to have a PickledObjectField
be user editable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | # --------------------------------------- fields.py --------------------------------------- #
from copy import deepcopy
from base64 import b64encode, b64decode
from zlib import compress, decompress
try:
from cPickle import loads, dumps
except ImportError:
from pickle import loads, dumps
from django.db import models
from django.utils.encoding import force_unicode
class PickledObject(str):
"""
A subclass of string so it can be told whether a string is a pickled
object or not (if the object is an instance of this class then it must
[well, should] be a pickled one).
Only really useful for passing pre-encoded values to ``default``
with ``dbsafe_encode``, not that doing so is necessary. If you
remove PickledObject and its references, you won't be able to pass
in pre-encoded values anymore, but you can always just pass in the
python objects themselves.
"""
pass
def dbsafe_encode(value, compress_object=False):
"""
We use deepcopy() here to avoid a problem with cPickle, where dumps
can generate different character streams for same lookup value if
they are referenced differently.
The reason this is important is because we do all of our lookups as
simple string matches, thus the character streams must be the same
for the lookups to work properly. See tests.py for more information.
"""
if not compress_object:
value = b64encode(dumps(deepcopy(value)))
else:
value = b64encode(compress(dumps(deepcopy(value))))
return PickledObject(value)
def dbsafe_decode(value, compress_object=False):
if not compress_object:
value = loads(b64decode(value))
else:
value = loads(decompress(b64decode(value)))
return value
class PickledObjectField(models.Field):
"""
A field that will accept *any* python object and store it in the
database. PickledObjectField will optionally compress it's values if
declared with the keyword argument ``compress=True``.
Does not actually encode and compress ``None`` objects (although you
can still do lookups using None). This way, it is still possible to
use the ``isnull`` lookup type correctly. Because of this, the field
defaults to ``null=True``, as otherwise it wouldn't be able to store
None values since they aren't pickled and encoded.
"""
__metaclass__ = models.SubfieldBase
def __init__(self, *args, **kwargs):
self.compress = kwargs.pop('compress', False)
self.protocol = kwargs.pop('protocol', 2)
kwargs.setdefault('null', True)
kwargs.setdefault('editable', False)
super(PickledObjectField, self).__init__(*args, **kwargs)
def get_default(self):
"""
Returns the default value for this field.
The default implementation on models.Field calls force_unicode
on the default, which means you can't set arbitrary Python
objects as the default. To fix this, we just return the value
without calling force_unicode on it. Note that if you set a
callable as a default, the field will still call it. It will
*not* try to pickle and encode it.
"""
if self.has_default():
if callable(self.default):
return self.default()
return self.default
# If the field doesn't have a default, then we punt to models.Field.
return super(PickledObjectField, self).get_default()
def to_python(self, value):
"""
B64decode and unpickle the object, optionally decompressing it.
If an error is raised in de-pickling and we're sure the value is
a definite pickle, the error is allowed to propogate. If we
aren't sure if the value is a pickle or not, then we catch the
error and return the original value instead.
"""
if value is not None:
try:
value = dbsafe_decode(value, self.compress)
except:
# If the value is a definite pickle; and an error is raised in
# de-pickling it should be allowed to propogate.
if isinstance(value, PickledObject):
raise
return value
def get_db_prep_value(self, value):
"""
Pickle and b64encode the object, optionally compressing it.
The pickling protocol is specified explicitly (by default 2),
rather than as -1 or HIGHEST_PROTOCOL, because we don't want the
protocol to change over time. If it did, ``exact`` and ``in``
lookups would likely fail, since pickle would now be generating
a different string.
"""
if value is not None and not isinstance(value, PickledObject):
# We call force_unicode here explicitly, so that the encoded string
# isn't rejected by the postgresql_psycopg2 backend. Alternatively,
# we could have just registered PickledObject with the psycopg
# marshaller (telling it to store it like it would a string), but
# since both of these methods result in the same value being stored,
# doing things this way is much easier.
value = force_unicode(dbsafe_encode(value, self.compress))
return value
def value_to_string(self, obj):
value = self._get_val_from_obj(obj)
return self.get_db_prep_value(value)
def get_internal_type(self):
return 'TextField'
def get_db_prep_lookup(self, lookup_type, value):
if lookup_type not in ['exact', 'in', 'isnull']:
raise TypeError('Lookup type %s is not supported.' % lookup_type)
# The Field model already calls get_db_prep_value before doing the
# actual lookup, so all we need to do is limit the lookup types.
return super(PickledObjectField, self).get_db_prep_lookup(lookup_type, value)
# --------------------------------------- tests.py --------------------------------------- #
"""Unit testing for this module."""
from django.test import TestCase
from django.db import models
from fields import PickledObjectField
class TestingModel(models.Model):
pickle_field = PickledObjectField()
compressed_pickle_field = PickledObjectField(compress=True)
default_pickle_field = PickledObjectField(default=({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
class TestCustomDataType(str):
pass
class PickledObjectFieldTests(TestCase):
def setUp(self):
self.testing_data = (
{1:2, 2:4, 3:6, 4:8, 5:10},
'Hello World',
(1, 2, 3, 4, 5),
[1, 2, 3, 4, 5],
TestCustomDataType('Hello World'),
)
return super(PickledObjectFieldTests, self).setUp()
def testDataIntegriry(self):
"""
Tests that data remains the same when saved to and fetched from
the database, whether compression is enabled or not.
"""
for value in self.testing_data:
model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
model_test.save()
model_test = TestingModel.objects.get(id__exact=model_test.id)
# Make sure that both the compressed and uncompressed fields return
# the same data, even thought it's stored differently in the DB.
self.assertEquals(value, model_test.pickle_field)
self.assertEquals(value, model_test.compressed_pickle_field)
model_test.delete()
# Make sure the default value for default_pickled_field gets stored
# correctly and that it isn't converted to a string.
model_test = TestingModel()
model_test.save()
model_test = TestingModel.objects.get(id__exact=model_test.id)
self.assertEquals(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]), model_test.default_pickle_field)
def testLookups(self):
"""
Tests that lookups can be performed on data once stored in the
database, whether compression is enabled or not.
One problem with cPickle is that it will sometimes output
different streams for the same object, depending on how they are
referenced. It should be noted though, that this does not happen
for every object, but usually only with more complex ones.
>>> from pickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, \
... 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, \
... 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5\ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."
>>> dumps(t)
"((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5\ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."
>>> # Both dumps() are the same using pickle.
>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."
>>> # But with cPickle the two dumps() are not the same!
>>> # Both will generate the same object when loads() is called though.
We can solve this by calling deepcopy() on the value before
pickling it, as this copies everything to a brand new data
structure.
>>> from cPickle import dumps
>>> from copy import deepcopy
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(deepcopy(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(deepcopy(t))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> # Using deepcopy() beforehand means that now both dumps() are idential.
>>> # It may not be necessary, but deepcopy() ensures that lookups will always work.
Unfortunately calling copy() alone doesn't seem to fix the
problem as it lies primarily with complex data types.
>>> from cPickle import dumps
>>> from copy import copy
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(copy(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(copy(t))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."
"""
for value in self.testing_data:
model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
model_test.save()
# Make sure that we can do an ``exact`` lookup by both the
# pickle_field and the compressed_pickle_field.
model_test = TestingModel.objects.get(pickle_field__exact=value, compressed_pickle_field__exact=value)
self.assertEquals(value, model_test.pickle_field)
self.assertEquals(value, model_test.compressed_pickle_field)
# Make sure that ``in`` lookups also work correctly.
model_test = TestingModel.objects.get(pickle_field__in=[value], compressed_pickle_field__in=[value])
self.assertEquals(value, model_test.pickle_field)
self.assertEquals(value, model_test.compressed_pickle_field)
# Make sure that ``is_null`` lookups are working.
self.assertEquals(1, TestingModel.objects.filter(pickle_field__isnull=False).count())
self.assertEquals(0, TestingModel.objects.filter(pickle_field__isnull=True).count())
model_test.delete()
# Make sure that lookups of the same value work, even when referenced
# differently. See the above docstring for more info on the issue.
value = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
model_test.save()
# Test lookup using an assigned variable.
model_test = TestingModel.objects.get(pickle_field__exact=value)
self.assertEquals(value, model_test.pickle_field)
# Test lookup using direct input of a matching value.
model_test = TestingModel.objects.get(
pickle_field__exact = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]),
compressed_pickle_field__exact = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]),
)
self.assertEquals(value, model_test.pickle_field)
model_test.delete()
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 8 months, 1 week ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 8 months, 2 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 3 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 3 months ago
- Help text hyperlinks by sa2812 1 year, 4 months ago
Comments
I have a baffling problem with this. I am trying to save an (unconnected) model instance into the field.
It works when the instance is in a field in a new record: it gets nicely pickled in the INSERT and I can see it in the database.
But it does not work on update and NULL gets written to the database (regardless of my default). Programmatically, the field contains the instance prior to (and post) the save - but somewhere between there and the UPDATE SQL it goes missing.
(Everything works perfectly if the pickled instance is not of a subclass of models.Model. Even models.Manager can be pickled!)
I am no field-extension expert and I am having trouble tracking this down. In the meantime any thoughts?
#
To (mostly) answer my own question.
My issue is way down at the bottom of the code, just before the SQL execution of an UPDATE:
if hasattr(val, 'prepare_database_save'): val = val.prepare_database_save(field) else: val = field.get_db_prep_save(val)
(It doesn't do this for INSERTs for reasons I don't quite understand... but that's why the inserts DO work)
Of course all models implement prepare_database_save (in order to get the ID for a foreign key relationship), and so the value turns into that key at the last minute (instead of going through your pickling code in get_db_prep_save).
And because my model is 'abstract' - in the sense that it hasn't gone into the database in the traditional way - it has no ID. Hence 'NULL' for the PickledObjectField value after an update.
Hard to find... not too hard to fix. (These 'picklable' models just need to derive from a super class that overrides that method to do get_db_prep_save instead).
Thought I'd go to the effort of writing it up, since I've seen at least one other person trying to do something similar (for an undo stack of model state
Otherwise, a wonderful snippet.
#
James,
That's a nice find! I mainly use the field for storing dictionary data that is arbitrary and that I don't need to query against, so I probably never would have found that error.
I spent a little bit of time trying to find a true solution, but was unable to come up with one. An easy workaround however, is to wrap the model object inside of a list or tuple. Since the list/tuple would not have the prepare_database_save method, it will call the field's get_db_prep_value as usual. Not fully transparent, but it does prevent the problem from occurring.
Another possibility is to write a proxy class for the model you wish to store, like so:
You can then use the proxy class when assigning a model to the PickledObjectField and it should work as expect (although I haven't tested this out explicitly). This probably won't work well if you're trying to store an arbitrary model though, since you'd need a proxy class for each and every model.
Let me know if you find any other problems; I'll do my best to help solve them.
In other news I've fixed a few bugs with the snippet. Despite my best efforts, a change I thought I made somehow wasn't included (although the docstrings mentioned it--so where did it go!?). To fix this, I've once again added the
deepcopy
function intodbsafe_encode
, so now lookups should work in all cases.Second, I fixed the snippet's
value_to_string
method, so that serialization should now actually work as expected. Before, serializing a model with a PickledObjectField would return not the encoded object as expected, but the encoded__repr__
of the object. Can't believe I missed that.Finally, I've changed the field to now be
editable=False
by default. I had changed it to this earlier, but somehow (like withdeepcopy
) it managed to disappear. Having the object editable in the admin is a bad idea, since any stored object will be converted to a string for display and then upon save, the string will be written to the database instead of the original object.#
I just got this error: ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '})' at line 1")
I'm working with MySQL on OS X, and maybe a too old version of django. I got the same unicode errors others got with the original snippet, and I'm trying to solve it.
Any ideas on what the hell is going on?
Thanks
#
ivankirigin,
Can you post the actual traceback you're getting and what data you're trying to save into the PickledObjectField? Without some more details I don't know what the problem could be. Also, are you getting both the ProgrammingError and the DjangoUnicodeError? More details would really help with troubleshooting this...
#
I made a form field to edit PicledObjectFields as JSON in the admin. This doesn't work if you're storing objects in your pickled field that can't be JSON-encoded. But for simple objects like dictionaries, it works very well. Add the following to the PickledObjectField class:
Then add this code to fields.py:
#
This may be obvious (to you) but the information may save somebody some time down the road: if you use erussel's code for turning on editing via JSON above, you must also delete the line
in the original class from above, otherwise you'll get an error in the admin that is difficult to debug.
#
I found an issue that I was hoping you could take a look at.
I have a model that has a PickledObjectField, which works fine. But then I run a QuerySet operation in which I defer() the PickledObjectField, then make a few edits to the model, and then perform a model.save().
But now when I attempt to read the PickledObjectField value in all future QuerySet operations, I get returned the raw base64 pickled string instead of a python object! It seems like the data has been corrupted somehow. But sometimes I can manually call dbsafe_decode to get back the python object, but I shouldn't have to do that. But even using dbsafe_decode only works sometimes.
When I looked into the SQL queries being executed on the model.save(), it runs a SELECT to get the value of the pickledobjectfield from the db first so that it has the full model which is then saved. It appears that the full pickled object field string is appropriately saved, but for some reason it still isn't working.
Any help would be great!
#
Please login first before commenting.