Dataclasses
Dataclasses (Python 3.7+) eliminate boilerplate code when creating classes that
primarily store data. The @dataclass decorator automatically generates
__init__, __repr__, __eq__, and other methods from
class attributes with type hints. What takes 15+ lines traditionally becomes just 4 lines.
What @dataclass Generates
- __init__: Initializer with all fields as parameters
- __repr__: String representation showing all field values
- __eq__: Equality comparison based on all fields
- __hash__: Hash function (if frozen=True or eq=False)
- Ordering methods: __lt__, __le__, __gt__, __ge__ (if order=True)
Dataclass Basics
A dataclass is defined by decorating a class with @dataclass and using type
annotations for fields. The decorator introspects the annotations and generates the
necessary methods automatically.
Click Run to execute your code
Type Hints Are Required, Not Enforced
Dataclasses require type annotations to identify fields, but Python doesn't enforce types
at runtime. Use type checkers like mypy or pyright for static
type validation during development.
Dataclass Fields
The field() function provides fine-grained control over individual fields.
Use it to set mutable defaults, exclude fields from repr/comparison, or add metadata.
Click Run to execute your code
Never Use Mutable Defaults Directly
Using items: list = [] would share one list across all instances!
Always use field(default_factory=list) for mutable defaults like
lists, dicts, or sets. Python raises ValueError if you try to use
a mutable default directly.
Frozen Dataclasses & __post_init__
Frozen dataclasses are immutable - attempting to modify fields raises an error. The
__post_init__ method runs after __init__ completes, perfect
for validation or computing derived fields.
Click Run to execute your code
InitVar for Init-Only Variables
InitVar[T] declares parameters that are passed to __init__ and
__post_init__ but not stored as instance attributes. Use this for values
needed during initialization but shouldn't be kept (like passwords to hash).
Advanced Features
Dataclasses support ordering, inheritance, conversion to dict/tuple, memory-efficient slots, and keyword-only arguments. These features make dataclasses suitable for complex real-world applications.
Click Run to execute your code
Python 3.10+ Features
slots=True reduces memory usage and prevents adding dynamic attributes.
kw_only=True forces all arguments to be keyword-only, improving code
readability for classes with many fields.
Common Mistakes
1. Using mutable default values directly
# Wrong - shared mutable default!
@dataclass
class BadInventory:
items: list = [] # ValueError! Python prevents this
# Correct - each instance gets its own list
@dataclass
class GoodInventory:
items: list = field(default_factory=list)
2. Putting defaults before non-defaults
# Wrong - defaults must come after non-defaults!
@dataclass
class BadOrder:
status: str = "pending" # Default
order_id: str # No default - TypeError!
# Correct - non-defaults first
@dataclass
class GoodOrder:
order_id: str # Required
status: str = "pending" # Optional
# Alternative - use field(default=...) for complex ordering
@dataclass
class FlexibleOrder:
status: str = field(default="pending")
order_id: str = field(default_factory=lambda: "ORD-001")
3. Forgetting frozen dataclasses need object.__setattr__ in __post_init__
# Wrong - can't set attributes on frozen instance
@dataclass(frozen=True)
class BadVector:
x: float
y: float
magnitude: float = field(init=False)
def __post_init__(self):
self.magnitude = (self.x**2 + self.y**2)**0.5 # FrozenInstanceError!
# Correct - use object.__setattr__ for frozen
@dataclass(frozen=True)
class GoodVector:
x: float
y: float
magnitude: float = field(init=False)
def __post_init__(self):
object.__setattr__(self, 'magnitude', (self.x**2 + self.y**2)**0.5)
4. Inheriting from non-dataclass with defaults
# Problem - parent has defaults, child has required fields
@dataclass
class Parent:
name: str = "Unknown"
@dataclass
class Child(Parent):
age: int # TypeError! Non-default follows default
# Solution 1 - give child fields defaults too
@dataclass
class Child(Parent):
age: int = 0
# Solution 2 - use field() with kw_only in Python 3.10+
@dataclass
class Parent:
name: str = "Unknown"
@dataclass
class Child(Parent):
age: int = field(kw_only=True) # Now works!
5. Using dataclass when attrs or namedtuple would be better
# Dataclass might be overkill for simple cases
from dataclasses import dataclass
@dataclass
class Point: # Fine, but consider alternatives
x: int
y: int
# For simple immutable data, NamedTuple is lighter
from typing import NamedTuple
class Point(NamedTuple):
x: int
y: int
# For more features (validators, converters), use attrs
import attrs
@attrs.define
class Point:
x: int = attrs.field(validator=attrs.validators.instance_of(int))
y: int = attrs.field(validator=attrs.validators.instance_of(int))
# Rule of thumb:
# - NamedTuple: simple immutable records
# - dataclass: most cases, stdlib solution
# - attrs: need validation, converters, advanced features
Exercise: E-Commerce Product System
Task: Create a product management system using dataclasses.
Requirements:
Productdataclass with sku, name, price, quantity, category, and total_value() methodOrderdataclass with items list (default_factory), discount (hidden from repr), and total() methodAddressfrozen dataclass that can be used as dict keys
Click Run to execute your code
Show Solution
from dataclasses import dataclass, field
from typing import List
@dataclass
class Product:
sku: str
name: str
price: float
quantity: int = 0
category: str = "General"
def total_value(self) -> float:
return self.price * self.quantity
@dataclass
class Order:
order_id: str
customer: str
items: List[Product] = field(default_factory=list)
discount: float = field(default=0.0, repr=False)
def add_item(self, product: Product):
self.items.append(product)
def subtotal(self) -> float:
return sum(item.price for item in self.items)
def total(self) -> float:
return self.subtotal() - self.discount
@dataclass(frozen=True)
class Address:
street: str
city: str
zip_code: str
country: str = "USA"
# Test
p1 = Product("SKU001", "Laptop", 999.99, 2)
p2 = Product("SKU002", "Mouse", 29.99, 5)
p3 = Product("SKU003", "Keyboard", 79.99, 3, "Electronics")
print("Products:")
print(f" {p1}")
print(f" Total value: ${p1.total_value():.2f}")
order = Order("ORD-001", "Alice")
order.add_item(p1)
order.add_item(p2)
order.add_item(p3)
print(f"\nOrder: {order.order_id}")
print(f" Items: {len(order.items)}")
print(f" Subtotal: ${order.subtotal():.2f}")
order.discount = 100.00
print(f" After $100 discount: ${order.total():.2f}")
addr1 = Address("123 Main St", "Boston", "02101")
addr2 = Address("123 Main St", "Boston", "02101")
print(f"\nAddress: {addr1}")
print(f" addr1 == addr2: {addr1 == addr2}")
print(f" Hashable: {hash(addr1) == hash(addr2)}")
Summary
@dataclassauto-generates __init__, __repr__, __eq__ from type-annotated fields- Use
field(default_factory=list)for mutable defaults field(repr=False, compare=False)controls which methods include a fieldfrozen=Truecreates immutable, hashable dataclasses__post_init__runs after init for validation or computed fieldsInitVar[T]declares init-only variables not stored as attributesorder=Trueadds comparison methods (__lt__, __gt__, etc.)asdict()andastuple()convert to dict/tuple- Python 3.10+:
slots=True,kw_only=Truefor more control
What's Next?
You've completed the Object-Oriented Programming module! Next, explore
Iterators & Generators to learn about lazy evaluation,
the iterator protocol, yield, and memory-efficient data processing.
Enjoying these tutorials?