Box has surpassed my expectations in popularity, even appearing in the Python Weekly newsletter and on Python Bytes podcast. I am very grateful for everyone that uses it and hope that you find it to be another great tool in your Python tool-belt! Also a huge shot out to everyone that has provided feedback or code contributes!
I first posted about Box about a month ago and linked to it on /r/python, which had great feedback and critique. With such interest, I have been working diligently the past few weeks to try and make it better, and I feel it confident it isn’t just better, it’s substantially improved.
Box 3 brings two huge changes to the table. First is that can take arguments to allow it to work in new and different ways, such as becoming a default_box
or a frozen_box
. Second, Box does not longer deals with converting all sub dicts into Boxes upon creation, rather upon retrieval. These two things means that Box is now more powerful, customizable and faster than ever before.
New Features
Conversion Box
This turns all those pesky keys that normally could not be accessed as an attribute into a safe string. automatically ‘fixes’ keys that can’t become attributes into valid lookups.
my_box = Box({"Send him away!": True}) my_box["Send him away!"] == my_box.Send_him_away
To be clear, this NEVER modifies the actual dictionary key, only allows attribute lookup to search for the converted name. This is the only feature that is enabled by default. Also keep it mind it does not modify case or attempt to find ascii equivalent for extended charset letters, it simply removes anything outside the allowed range.
You can also set those values through the attribute. The danger with that is if two improper strings would transform into the same attribute, such as a~b
and a!b
would then both equal the attribute a_b
, but the Box would only modify or retrieve the first one it happened to find.
my_box.Send_him_away = False my_box.items() # dict_items([('Send him away!', False)])
Default Box
If you don’t like getting errors on failed lookups, or bothering to create all those in-between boxes, this is for you!
default_box = Box(default_box=True) default_box.Not_There # <Box: {}> default_box.a.b.c.d.c.e.f.g.h = "x" # <Box: {'a': {'b': {'c': {'d': {'c': {'e': {'f': {'g': {'h': 'x'}}}}}}}}}>
You can specify what you want the default return value to be with default_box_attr
which by default is a Box
class. If something callable is specified, it will return it as an instantiated version, otherwise it will return the specified result as is.
bx3 = Box(default_box=True, default_box_attr=3) bx3.hello # 3
However it now only acts like a standard default dict, and you do lose the magical recursion creation.
bx3.hello.there # AttributeError: 'int' object has no attribute 'there'
Frozen Box
Brrrrr, you feel that? It’s a box that’s too cool to modify. Handy if you want to ensure the incoming data is not mutilated during it’s usage. An immutable box cannot have data added or delete, like a tuple it can be hashed if all content is immutable, but if it has mutable content can still modify those objects and is unhashable.
frigid_box = Box({"Kung Fu Panda": "Pure Awesome"}, frozen_box=True) frigid_box.Kung_Fu_Panda = "Overrated kids movie" # Traceback (most recent call last): # ... # box.BoxError: Box is frozen hash(frigid_box) # -156221188
Camel Killer Box
DontYouHateCamelCaseKeys? Well I do too, kill them with fire camel_killer_box=True
.
snake_box = Box({"BadKey": "silly value"}, camel_killer_box=True) assert snake_box.bad_key == "silly_value"
This transformation, including lower casing, is applied to everything. Which means the above example for just conversion box would also be lower case.
snake_box["Send him away!"] == snake_box.send_him_away
Box It Up
Force the conversion of all sub objects on creation, just like old times. This is useful if you expect to be referencing every value, and every sub-box value and want to front load the time at creation instead of during operations. Also a necessity if you plan to keep the reference or re-use the original dictionary that was boxed.
boxed = Box({"a": {"b": [{"c": "d"}]}}, box_it_up=True)
Dangerous Waters
As stated above, sub box objects are now not recreated until they are retrieved or modified. Which means if you have references to the original dict and modify it, you may inadvertently be updating the same objects that are in the box.
a = {"a": {"b": {"c": {}}}} a_box = Box(a) a_box # <Box: {'a': {'b': {'c': {}}}}> a["a"]["b"]["d"] = "2" a_box # <Box: {'a': {'b': {'c': {}, 'd': '2'}}}>
So if you plan to keep the original dict around, make sure to box_it_up
or do a deepcopy
first.
safe_box = Box(a, box_it_up=True) a["a"]["b"]["d"] = "2" safe_box # <Box: {'a': {'b': {'c': {}}}}>
Obviously not an issue if you are creating it from scratch, or with a new and unreferenced dict, just something to be aware of.
Speed and Memory Usage
Something a lot of people were worried about was speed and memory footprint compared to alternatives and the built-in dict
. With the previous version of Box, I was more of the mindset that it lived to be a scripter, sysadmin or developer aid in writing in the code, and less performant worried. Simply put, it was foolish to think that way. If I really want this to be the go to dot notation library, it needs to be as fast as, if not faster, than it’s peers while still providing more functionality.
Because Box is the only dot notation library that I know of that works by converting upon retrieval, it is hard to do a true 1 to 1 test verse the others. To the best of my abilities, I have attempted to capture the angles that show where the performance differences are most noticeable.
This test is using a modified version of HearthStone card data, available on github, which is 6.57MBs and has 2224 keys, with sub dictionaries and data. To gather metrics, I am using reusables.time_it
and memory_profiler.profile
and the file can be run yourself. Box is being compared against the two most applicable competitors I know of, addict and DotMap (as Bunch and EasyDict do not account for recursion or have other common features), as well as the built in dict
type. All charts except the memory usage are in seconds.
Lower is always better, these are speed and memory footprint charts
Creation and Conversion
So first off, how long does it take to convert a JSON file to the object type, and back again.
The load is the first thing done, between then and the transformation back into a dictionary, all first level items are retrieved and 1000 new elements were added to the object. Thanks to the new style of only transforming sub items upon retrieval, it greatly speeds up Box, making very comparative to the built in dict type, as seen in figure 1.
Load code comparison:
# addict with open("hearthstone_cards.json", encoding="utf-8") as f: ad = Dict(json.load(f)) # DotMap with open("hearthstone_cards.json", encoding="utf-8") as f: dm = DotMap(json.load(f)) # Box bx = Box.from_json(filename="hearthstone_cards.json", encoding="utf-8") # dict with open("hearthstone_cards.json", encoding="utf-8") as f: dt = json.load(f)
Value Lookup
The load speed up does come at the price of having a heavier first initial lookup call for each value. The following graph, figure 2, shows a retrieval of all 2224 first level values from the object.
As expected, Box took far and away the most time, but only on the initial lookup, and we are talking about a near order of magnitude less than the creation times for the other libraries.
The lookup is done the same with every object type.
for k in obj: obj[k]
Inserting items
Inserting a 1000 new entities into the objects deals with very small amounts of time all around.
Insert looks the most different from the code, as the different objects behave differently on insertion. Box will insert it as a plain dict, and not convert it until retrieval. Addict will not automatically make new items into addict.Dict
s, causing the need to manually. DotMap
behaves as a default dict out of the box, so we can simply access the attribute we want to create directly off the bat.
# addict for i in range(1000): ad["new {}".format(i)] = Dict(a=i) # DotMap for i in range(1000): dm["new {}".format(i)]['a'] = i # Box for i in range(1000): bx["new {}".format(i)] = {'a': i} # dict for i in range(1000): dt["new {}".format(i)] = {'a': i}
Total Times
To summarize the times, dict is obviously the fastest, but Box
is in a comfortable second fastest.
Memory Footprint
What I was most surprised with was memory usage. I have not had a lot of experience measuring this, so I leave room open for something obvious I am missing, but on first look, it seems that Box
uses less memory than even the built in dict.
""" Line # Mem usage Increment Line Contents ================================================ 85 28.9 MiB 0.0 MiB @profile() 86 def memory_test(): 87 41.9 MiB 13.0 MiB ad = load_addict() 88 57.1 MiB 15.2 MiB dm = load_dotmap() 89 64.5 MiB 7.4 MiB bx = load_box() 90 74.6 MiB 10.1 MiB dt = load_dict() 91 92 74.6 MiB 0.0 MiB lookup(ad) 93 74.6 MiB 0.0 MiB lookup(ad) 94 95 74.6 MiB 0.0 MiB lookup(dm) 96 74.6 MiB 0.0 MiB lookup(dm) 97 98 75.6 MiB 1.0 MiB lookup(bx) 99 75.6 MiB 0.0 MiB lookup(bx) 100 101 75.6 MiB 0.0 MiB lookup(dt) 102 75.6 MiB 0.0 MiB lookup(dt) 103 104 76.0 MiB 0.3 MiB addict_insert(ad) 105 76.6 MiB 0.6 MiB dotmap_insert(dm) 106 76.8 MiB 0.2 MiB box_insert(bx) 107 76.9 MiB 0.2 MiB dict_insert(dt) """
Questions I can answer
How does it work like a traditional dict and still take keyword arguments?
You can create a Box object just like you would create a dict as normal, just those new arguments will not be part of dict object (they are stored in a hidden ._box_config
attribute). They were also made rather unique, such as ‘conversion_box’ and ‘frozen_box’ so that it would be very rare for others to be using those kwargs.
Is it safe to wait to convert dictionaries until retrieval?
As long as you aren’t storing malformed objects that subclassed dict
, yes.