I recently got side-tracked into exploring the basics of namedtuple() as
I got a glimpse of its usage in our engineering codebase. Here’s my summary:
Mutable and Hashable
To understand the behavior of namedtuple(), it is best to also visit the
concept of Python object’s mutability and hashability. These two concepts are
closely linked.
Hashability: an object’s is hashable when its hash value never changed during its lifetime
Most of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not; immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable. Objects which are instances of user-defined classes are hashable by default
Mutability: an object with a fixed value and cannot be altered is immutable
(For example: int, float, string, tuple).
in contrast, an object can keep its value while keeping its id() is mutable.
(For example: list, dict)
hash() and id()
identity: id(), the identity of the two same value variables are the same
If two objects (that exist at the same time) have the same identity, they’re actually two references to the same object.
The
isoperator compares items by identity,a is bis equivalent toid(a) == id(b).
hash value: hash(), hash value is based off an object’s value, and hash value
must remain the same for the lifetime of the object. If an object is mutable,
then it doesn’t make sense for it to have hash.
The hash value is an integer which is used to quickly compare dictionary keys or sets.
Why Hash?
Hash values are very useful, as they enable quick look-up of values
in a large collection of values, it’s commonly used in set and dict.
with if x in elements::
-
In a
list, Python needs to go through the whole list and comparex's value with each value in the list elements. -
In a
set, Python keeps track of each element’s hash, Python will get the hash-value forx, look that up in an internal structure and find elements that have the same hash asx.
It also means we can have non-hashable objects in a list,
but not in a set or as keys in a dict.
Example:
There is no way to change an int object’s value without re-assigning (copy) it
to a different object.
1 | x = 5 |
But for list, we can edit its value after assignment while keeping its id()
the same. (note: use list built-in function rather than re-assignment,
this is the same for x.sort vs. x=sorted(x))
1 | x = [5] |
NamedTuple
A data class are just regular classes that are geared towards storing state,
rather than containing a lot of logic, namedtuple() is one kind of data classes.
Every time we create a class that mostly consists of attributes, we make a data class.
With namedtuple(), we can create immutable sequence types
that allow us to access their values using descriptive field names
and the dot notation instead of unclear integer indices.
Initialization
- typename:
str, class name of thenamedtuple - field names: names that are used to access values in the
namedtuple, it can be declared using any of the following:- iterable of strings: [“a”, “b”, “c”]
- a string with name seperated by white spaces: “a b c”
- a string with name separated by commas: “a, b, c”
Example:
1 | from collections import namedtuple |
Access and Edit Value
It is very straight-forward to access a tuple’s attribute value using dot notation
this gives namedtuple a great edge against dict or tuple.
1 | Person = namedtuple('Person', 'name children') |
Since namedtuple is immutable, we can’t assign value to its attribute;
what we can do is to use ._replace(); and also, its value can be mutable, like
a list.
1 | >> jj.children = ['Tobby', 'Wang'] |
Using ._asdict()
The built-in function ._asdict() converts namedtuple into a dictionary.
1 | Person = namedtuple("Person", "name age height") |
and to generate a namedtuple object from dictionary
1 | d = { |
@dataclass
@dataclass came out after Python 3.7, which is similar to namedtuple, but they are mutable.
thus, we can set value to a @dataclass attribute.
1 | from dataclasses import dataclass |
frozen attribute
if we want @dataclass to behave like namedtuple with an un-editable “protected” attribute,
just use @dataclass(frozen=True).
override __iter__()
@dataclass are also not iterable by default, unlike namedtuple. We can achieve
that by implementing the special method .__iter__():
1 | from dataclasses import astuple, dataclass |
Subclassing namedtuple
Subclassing namedtuple gives us additional functionality.
1 | BasePerson = namedtuple("BasePerson", "name birthdate country") |
In the above example, subclassing from namedtuple provides us better documentation
(i.e. Person.__doc__), better string representation (i.e. print jane) and an
extra property to access based off a Person’s instance attribute value.
__new__() constructor
Zechong Hu’s Blog - Inheritance for Python Namedtuples
To override the constructor for namedtuple class with default value:
1 | BasePerson = namedtuple("BasePerson", ["name", "birthdate" ,"country"]) |
__slots__
The special attribute __slots__ explicitly state what attribute we want
our class instances to have.
By default, when an instance (object) is created,
__dict__ is used to store an object’s (writable) attributes.
A dynamic dictionary:
- requires more memory
- takes longer time to create.
Because namedtuple makes immutable instances that are lightweight,
we need to prevent the creation of __dict__ to get the benefit while subclassing
by setting __slots__ as empty tuple.
In a more general note, please consider using __slots__ when creating
tons of objects, this saves memory and time when instancing.
Comparison __dict__ vs. __slots__
1 | class Person(object): |
1 | class Person(object): |
Reference
Medium megha mohan - Mutable vs Immutable Objects in Python
Stack Overflow - What are data classes and how are they different from common classes?
Geeks for Geesk - Use of __slots__
Stack Overflow - Usage of __slots__?
Stack Overflow - Difference between hash() and id()
Stack Overflow - Two variables in Python have same id, but not lists or tuples