Encode Python objects as JSON strings, and decode JSON strings into Python objects.
The json module provides an API similar to pickle for converting in-memory Python objects to a serialized representation known as JavaScript Object Notation (JSON). Unlike pickle, JSON has the benefit of having implementations in many languages (especially JavaScript), making it suitable for inter-application communication. JSON is probably most widely used for communicating between the web server and client in an AJAX application, but is not limited to that problem domain.
Encoding and Decoding Simple Data Types
The encoder understands Python’s native types by default (string, unicode, int, float, list, tuple, dict).import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
data_string = json.dumps(data)
print 'JSON:', data_string
$ python json_simple_types.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
data_string = json.dumps(data)
print 'ENCODED:', data_string
decoded = json.loads(data_string)
print 'DECODED:', decoded
print 'ORIGINAL:', type(data[0]['b'])
print 'DECODED :', type(decoded[0]['b'])
$ python json_simple_types_decode.py
ENCODED: [{"a": "A", "c": 3.0, "b": [2, 4]}]
DECODED: [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
ORIGINAL: <type 'tuple'>
DECODED : <type 'list'>
Human-consumable vs. Compact Output
Another benefit of JSON over pickle is that the results are human-readable. The dumps() function accepts several arguments to make the output even nicer. For example, sort_keys tells the encoder to output the keys of a dictionary in sorted, instead of random, order.import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
unsorted = json.dumps(data)
print 'JSON:', json.dumps(data)
print 'SORT:', json.dumps(data, sort_keys=True)
first = json.dumps(data, sort_keys=True)
second = json.dumps(data, sort_keys=True)
print 'UNSORTED MATCH:', unsorted == first
print 'SORTED MATCH :', first == second
$ python json_sort_keys.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
JSON: [{"a": "A", "c": 3.0, "b": [2, 4]}]
SORT: [{"a": "A", "b": [2, 4], "c": 3.0}]
UNSORTED MATCH: False
SORTED MATCH : True
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
print 'NORMAL:', json.dumps(data, sort_keys=True)
print 'INDENT:', json.dumps(data, sort_keys=True, indent=2)
$ python json_indent.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
NORMAL: [{"a": "A", "b": [2, 4], "c": 3.0}]
INDENT: [
{
"a": "A",
"b": [
2,
4
],
"c": 3.0
}
]
import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
print 'DATA:', repr(data)
print 'repr(data) :', len(repr(data))
print 'dumps(data) :', len(json.dumps(data))
print 'dumps(data, indent=2) :', len(json.dumps(data, indent=2))
print 'dumps(data, separators):', len(json.dumps(data, separators=(',',':')))
$ python json_compact_encoding.py
DATA: [{'a': 'A', 'c': 3.0, 'b': (2, 4)}]
repr(data) : 35
dumps(data) : 35
dumps(data, indent=2) : 76
dumps(data, separators): 29
Encoding Dictionaries
The JSON format expects the keys to a dictionary to be strings. If you have other types as keys in your dictionary, trying to encode the object will produce a ValueError. One way to work around that limitation is to skip over non-string keys using the skipkeys argument:import json
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0, ('d',):'D tuple' } ]
print 'First attempt'
try:
print json.dumps(data)
except (TypeError, ValueError) as err:
print 'ERROR:', err
print
print 'Second attempt'
print json.dumps(data, skipkeys=True)
$ python json_skipkeys.py
First attempt
ERROR: keys must be a string
Second attempt
[{"a": "A", "c": 3.0, "b": [2, 4]}]
Working with Your Own Types
All of the examples so far have used Pythons built-in types because those are supported by json natively. It isn’t uncommon, of course, to have your own types that you want to be able to encode as well. There are two ways to do that.First, we’ll need a class to encode:
class MyObj(object):
def __init__(self, s):
self.s = s
def __repr__(self):
return '<MyObj(%s)>' % self.s
import json
import json_myobj
obj = json_myobj.MyObj('instance value goes here')
print 'First attempt'
try:
print json.dumps(obj)
except TypeError, err:
print 'ERROR:', err
def convert_to_builtin_type(obj):
print 'default(', repr(obj), ')'
# Convert objects to a dictionary of their representation
d = { '__class__':obj.__class__.__name__,
'__module__':obj.__module__,
}
d.update(obj.__dict__)
return d
print
print 'With default'
print json.dumps(obj, default=convert_to_builtin_type)
$ python json_dump_default.py
First attempt
ERROR: <MyObj(instance value goes here)> is not JSON serializable
With default
default( <MyObj(instance value goes here)> )
{"s": "instance value goes here", "__module__": "json_myobj", "__class__": "MyObj"}
The object_hook is called for each dictionary decoded from the incoming data stream, giving us a chance to convert the dictionary to another type of object. The hook function should return the object it wants the calling application to receive instead of the dictionary.
import json
def dict_to_object(d):
if '__class__' in d:
class_name = d.pop('__class__')
module_name = d.pop('__module__')
module = __import__(module_name)
print 'MODULE:', module
class_ = getattr(module, class_name)
print 'CLASS:', class_
args = dict( (key.encode('ascii'), value) for key, value in d.items())
print 'INSTANCE ARGS:', args
inst = class_(**args)
else:
inst = d
return inst
encoded_object = '[{"s": "instance value goes here", "__module__": "json_myobj", "__class__": "MyObj"}]'
myobj_instance = json.loads(encoded_object, object_hook=dict_to_object)
print myobj_instance
$ python json_load_object_hook.py
MODULE: <module 'json_myobj' from '/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/json/json_myobj.pyc'>
CLASS: <class 'json_myobj.MyObj'>
INSTANCE ARGS: {'s': u'instance value goes here'}
[<MyObj(instance value goes here)>]
Encoder and Decoder Classes
Besides the convenience functions we have already examined, the json module provides classes for encoding and decoding. When using the classes directly, you have access to extra APIs and can create subclasses to customize their behavior.The JSONEncoder provides an iterable interface for producing “chunks” of encoded data, making it easier for you to write to files or network sockets without having to represent an entire data structure in memory.
import json
encoder = json.JSONEncoder()
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
for part in encoder.iterencode(data):
print 'PART:', part
$ python json_encoder_iterable.py
PART: [
PART: {
PART: "a"
PART: :
PART: "A"
PART: ,
PART: "c"
PART: :
PART: 3.0
PART: ,
PART: "b"
PART: :
PART: [2
PART: , 4
PART: ]
PART: }
PART: ]
To encode arbitrary objects, we can override the default() method with an implementation similar to what we used above in convert_to_builtin_type().
import json
import json_myobj
class MyEncoder(json.JSONEncoder):
def default(self, obj):
print 'default(', repr(obj), ')'
# Convert objects to a dictionary of their representation
d = { '__class__':obj.__class__.__name__,
'__module__':obj.__module__,
}
d.update(obj.__dict__)
return d
obj = json_myobj.MyObj('internal data')
print obj
print MyEncoder().encode(obj)
$ python json_encoder_default.py
<MyObj(internal data)>
default( <MyObj(internal data)> )
{"s": "internal data", "__module__": "json_myobj", "__class__": "MyObj"}
import json
class MyDecoder(json.JSONDecoder):
def __init__(self):
json.JSONDecoder.__init__(self, object_hook=self.dict_to_object)
def dict_to_object(self, d):
if '__class__' in d:
class_name = d.pop('__class__')
module_name = d.pop('__module__')
module = __import__(module_name)
print 'MODULE:', module
class_ = getattr(module, class_name)
print 'CLASS:', class_
args = dict( (key.encode('ascii'), value) for key, value in d.items())
print 'INSTANCE ARGS:', args
inst = class_(**args)
else:
inst = d
return inst
encoded_object = '[{"s": "instance value goes here", "__module__": "json_myobj", "__class__": "MyObj"}]'
myobj_instance = MyDecoder().decode(encoded_object)
print myobj_instance
$ python json_decoder_object_hook.py
MODULE: <module 'json_myobj' from '/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/json/json_myobj.pyc'>
CLASS: <class 'json_myobj.MyObj'>
INSTANCE ARGS: {'s': u'instance value goes here'}
[<MyObj(instance value goes here)>]
Working with Streams and Files
In all of the examples so far, we have assumed that we could (and should) hold the encoded version of the entire data structure in memory at one time. With large data structures it may be preferable to write the encoding directly to a file-like object. The convenience functions load() and dump() accept references to a file-like object to use for reading or writing.import json
import tempfile
data = [ { 'a':'A', 'b':(2, 4), 'c':3.0 } ]
f = tempfile.NamedTemporaryFile(mode='w+')
json.dump(data, f)
f.flush()
print open(f.name, 'r').read()
$ python json_dump_file.py
[{"a": "A", "c": 3.0, "b": [2, 4]}]
import json
import tempfile
f = tempfile.NamedTemporaryFile(mode='w+')
f.write('[{"a": "A", "c": 3.0, "b": [2, 4]}]')
f.flush()
f.seek(0)
print json.load(f)
$ python json_load_file.py
[{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
Mixed Data Streams
The JSONDecoder includes the raw_decode() method for decoding a data structure followed by more data, such as JSON data with trailing text. The return value is the object created by decoding the input data, and an index into that data indicating where decoding left off.import json
decoder = json.JSONDecoder()
def get_decoded_and_remainder(input_data):
obj, end = decoder.raw_decode(input_data)
remaining = input_data[end:]
return (obj, end, remaining)
encoded_object = '[{"a": "A", "c": 3.0, "b": [2, 4]}]'
extra_text = 'This text is not JSON.'
print 'JSON first:'
obj, end, remaining = get_decoded_and_remainder(' '.join([encoded_object, extra_text]))
print 'Object :', obj
print 'End of parsed input :', end
print 'Remaining text :', repr(remaining)
print
print 'JSON embedded:'
try:
obj, end, remaining = get_decoded_and_remainder(
' '.join([extra_text, encoded_object, extra_text])
)
except ValueError, err:
print 'ERROR:', err
$ python json_mixed_data.py
JSON first:
Object : [{u'a': u'A', u'c': 3.0, u'b': [2, 4]}]
End of parsed input : 35
Remaining text : ' This text is not JSON.'
JSON embedded:
ERROR: No JSON object could be decoded
0 comments :