stay Python One of the things you do very often is Python Data types and JSON Conversion of data types .
But there is an obvious problem ,JSON As a data exchange format, there are fixed data types , however Python As a programming language, in addition to the built-in data types, you can also write custom data types .
For example, you must have encountered similar problems :
>>> import json >>> import decimal >>> >>> data = {'key1': 'string', 'key2': 10, 'key3': decimal.Decimal('1.45')} >>> json.dumps(data) Traceback (most recent call last): File "<input>", line 1, in <module> json.dumps(data) File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.6/json/encoder.py", line 180, in default o.__class__.__name__) TypeError: Object of type 'Decimal' is not JSON serializable Copy code
So here's the problem , How to put all kinds of Python The data type is transformed into JSON data type . A very pythonic The way to do it is , First convert into some kind of energy and JSON Data type directly converted values , And then in dump, It's very direct and violent , But it is weak in front of various fancy data types .
Google Is one of the important ways to solve problems , When you have a search , You will find that you can actually dumps when encode At this stage, the data is transformed .
So you must have done that , Solved the problem perfectly .
>>> class DecimalEncoder(json.JSONEncoder): ... def default(self, obj): ... if isinstance(obj, decimal.Decimal): ... return float(obj) ... return super(DecimalEncoder, self).default(obj) ... ... >>> >>> json.dumps(data, cls=DecimalEncoder) '{"key1": "string", "key2": 10, "key3": 1.45}' Copy code
The code in the text is extracted from github.com/python/cpyt… Deleted almost all docstring, Because the code is too long , Directly intercepted important fragments . You can view the complete code in the top link of the fragment .
be familiar with json Everyone in this library knows only 4 A commonly used API, Namely dump、dumps and load、loads.
The source code is located in cpython/Lib/json in
# https://github.com/python/cpython/blob/master/Lib/json/__init__.py#L183-L238 def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw): # cached encoder if (not skipkeys and ensure_ascii and check_circular and allow_nan and cls is None and indent is None and separators is None and default is None and not sort_keys and not kw): return _default_encoder.encode(obj) if cls is None: cls = JSONEncoder # a key return cls( skipkeys=skipkeys, ensure_ascii=ensure_ascii, check_circular=check_circular, allow_nan=allow_nan, indent=indent, separators=separators, default=default, sort_keys=sort_keys, **kw).encode(obj) Copy code
Directly see the last return. It can be found that if no cls Use by default JSONEncoder, Then call the instance method of the class encode.
encode The method is also very simple :
# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L182-L202 def encode(self, o): # str Type direct encode After the return if isinstance(o, str): if self.ensure_ascii: return encode_basestring_ascii(o) else: return encode_basestring(o) # chunks Are the parts of the data chunks = self.iterencode(o, _one_shot=True) if not isinstance(chunks, (list, tuple)): chunks = list(chunks) return ''.join(chunks) Copy code
We can see that in the end we get JSON All are chunks It's stitched together ,chunks Is to call self.iterencode Method derived .
# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L204-257 if (_one_shot and c_make_encoder is not None and self.indent is None): _iterencode = c_make_encoder( markers, self.default, _encoder, self.indent, self.key_separator, self.item_separator, self.sort_keys, self.skipkeys, self.allow_nan) else: _iterencode = _make_iterencode( markers, self.default, _encoder, self.indent, floatstr, self.key_separator, self.item_separator, self.sort_keys, self.skipkeys, _one_shot) return _iterencode(o, 0) Copy code
iterencode The method is longer , We only care about the last few lines .
Return value _iterencode
, Is in the function c_make_encoder
perhaps _make_iterencode
The return values of these two higher-order functions .
c_make_encoder
Is from _json
This module , This module It's a c modular , We don't care how this module is implemented . Turn to the study of equivalent _make_iterencode
Method .
# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L259-441 def _iterencode(o, _current_indent_level): if isinstance(o, str): yield _encoder(o) elif o is None: yield 'null' elif o is True: yield 'true' elif o is False: yield 'false' elif isinstance(o, int): # see comment for int/float in _make_iterencode yield _intstr(o) elif isinstance(o, float): # see comment for int/float in _make_iterencode yield _floatstr(o) elif isinstance(o, (list, tuple)): yield from _iterencode_list(o, _current_indent_level) elif isinstance(o, dict): yield from _iterencode_dict(o, _current_indent_level) else: if markers is not None: markerid = id(o) if markerid in markers: raise ValueError("Circular reference detected") markers[markerid] = o o = _default(o) yield from _iterencode(o, _current_indent_level) if markers is not None: del markers[markerid] return _iterencode Copy code
The only thing you need to care about is the return function , All kinds of... In the code if-elif-else Convert the built-in types to one by one JSON type . It is used when the type cannot be recognized _default()
This method , Then recursively call to parse each value .
_default
It's the one covered in the front default
.
Here you can fully understand Python How is it? encode become JSON data .
Summarize the process ,json.dumps()
call JSONEncoder Instance method of encode()
, Subsequent use iterencode()
Recursively convert various types , Finally, put chunks Concatenate into a string and return .
After the previous process analysis , Know why to inherit JSONEncoder Then cover default Method to complete custom type resolution .
Maybe you need to analyze it later datetime Type data , You will certainly do that :
class ExtendJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, decimal.Decimal): return int(obj) if isinstance(obj, datetime.datetime): return obj.strftime(DATETIME_FORMAT) return super(ExtendJSONEncoder, self).default(obj) Copy code
The last call to the parent class is default()
Methods are purely intended to trigger exceptions .
Python have access to singledispatch To solve this single generic problem .
import json from datetime import datetime from decimal import Decimal from functools import singledispatch class MyClass: def __init__(self, value): self._value = value def get_value(self): return self._value # Create three instances of non built-in types mc = MyClass('i am class MyClass ') dm = Decimal('11.11') dt = datetime.now() @singledispatch def convert(o): raise TypeError('can not convert type') @convert.register(datetime) def _(o): return o.strftime('%b %d %Y %H:%M:%S') @convert.register(Decimal) def _(o): return float(o) @convert.register(MyClass) def _(o): return o.get_value() class ExtendJSONEncoder(json.JSONEncoder): def default(self, obj): try: return convert(obj) except TypeError: return super(ExtendJSONEncoder, self).default(obj) data = { 'mc': mc, 'dm': dm, 'dt': dt } json.dumps(data, cls=ExtendJSONEncoder) # {"mc": "i am class MyClass ", "dm": 11.11, "dt": "Nov 10 2017 17:31:25"} Copy code
This way of writing is more in line with the specifications of design patterns . If there is a new type in the future , There's no need to modify ExtendJSONEncoder
class , Just add the appropriate singledispatch The method is ok , Compare pythonic .