Web Analytics

Dataclasses

Intermediate ~25 min read

Dataclasses (Python 3.7+) eliminate boilerplate code when creating classes that primarily store data. The @dataclass decorator automatically generates __init__, __repr__, __eq__, and other methods from class attributes with type hints. What takes 15+ lines traditionally becomes just 4 lines.

What @dataclass Generates

  • __init__: Initializer with all fields as parameters
  • __repr__: String representation showing all field values
  • __eq__: Equality comparison based on all fields
  • __hash__: Hash function (if frozen=True or eq=False)
  • Ordering methods: __lt__, __le__, __gt__, __ge__ (if order=True)

Dataclass Basics

A dataclass is defined by decorating a class with @dataclass and using type annotations for fields. The decorator introspects the annotations and generates the necessary methods automatically.

Output
Click Run to execute your code

Type Hints Are Required, Not Enforced

Dataclasses require type annotations to identify fields, but Python doesn't enforce types at runtime. Use type checkers like mypy or pyright for static type validation during development.

Dataclass Fields

The field() function provides fine-grained control over individual fields. Use it to set mutable defaults, exclude fields from repr/comparison, or add metadata.

Output
Click Run to execute your code

Never Use Mutable Defaults Directly

Using items: list = [] would share one list across all instances! Always use field(default_factory=list) for mutable defaults like lists, dicts, or sets. Python raises ValueError if you try to use a mutable default directly.

Frozen Dataclasses & __post_init__

Frozen dataclasses are immutable - attempting to modify fields raises an error. The __post_init__ method runs after __init__ completes, perfect for validation or computing derived fields.

Output
Click Run to execute your code

InitVar for Init-Only Variables

InitVar[T] declares parameters that are passed to __init__ and __post_init__ but not stored as instance attributes. Use this for values needed during initialization but shouldn't be kept (like passwords to hash).

Advanced Features

Dataclasses support ordering, inheritance, conversion to dict/tuple, memory-efficient slots, and keyword-only arguments. These features make dataclasses suitable for complex real-world applications.

Output
Click Run to execute your code

Python 3.10+ Features

slots=True reduces memory usage and prevents adding dynamic attributes. kw_only=True forces all arguments to be keyword-only, improving code readability for classes with many fields.

Common Mistakes

1. Using mutable default values directly

# Wrong - shared mutable default!
@dataclass
class BadInventory:
    items: list = []  # ValueError! Python prevents this

# Correct - each instance gets its own list
@dataclass
class GoodInventory:
    items: list = field(default_factory=list)

2. Putting defaults before non-defaults

# Wrong - defaults must come after non-defaults!
@dataclass
class BadOrder:
    status: str = "pending"  # Default
    order_id: str            # No default - TypeError!

# Correct - non-defaults first
@dataclass
class GoodOrder:
    order_id: str            # Required
    status: str = "pending"  # Optional

# Alternative - use field(default=...) for complex ordering
@dataclass
class FlexibleOrder:
    status: str = field(default="pending")
    order_id: str = field(default_factory=lambda: "ORD-001")

3. Forgetting frozen dataclasses need object.__setattr__ in __post_init__

# Wrong - can't set attributes on frozen instance
@dataclass(frozen=True)
class BadVector:
    x: float
    y: float
    magnitude: float = field(init=False)

    def __post_init__(self):
        self.magnitude = (self.x**2 + self.y**2)**0.5  # FrozenInstanceError!

# Correct - use object.__setattr__ for frozen
@dataclass(frozen=True)
class GoodVector:
    x: float
    y: float
    magnitude: float = field(init=False)

    def __post_init__(self):
        object.__setattr__(self, 'magnitude', (self.x**2 + self.y**2)**0.5)

4. Inheriting from non-dataclass with defaults

# Problem - parent has defaults, child has required fields
@dataclass
class Parent:
    name: str = "Unknown"

@dataclass
class Child(Parent):
    age: int  # TypeError! Non-default follows default

# Solution 1 - give child fields defaults too
@dataclass
class Child(Parent):
    age: int = 0

# Solution 2 - use field() with kw_only in Python 3.10+
@dataclass
class Parent:
    name: str = "Unknown"

@dataclass
class Child(Parent):
    age: int = field(kw_only=True)  # Now works!

5. Using dataclass when attrs or namedtuple would be better

# Dataclass might be overkill for simple cases
from dataclasses import dataclass

@dataclass
class Point:  # Fine, but consider alternatives
    x: int
    y: int

# For simple immutable data, NamedTuple is lighter
from typing import NamedTuple

class Point(NamedTuple):
    x: int
    y: int

# For more features (validators, converters), use attrs
import attrs

@attrs.define
class Point:
    x: int = attrs.field(validator=attrs.validators.instance_of(int))
    y: int = attrs.field(validator=attrs.validators.instance_of(int))

# Rule of thumb:
# - NamedTuple: simple immutable records
# - dataclass: most cases, stdlib solution
# - attrs: need validation, converters, advanced features

Exercise: E-Commerce Product System

Task: Create a product management system using dataclasses.

Requirements:

  • Product dataclass with sku, name, price, quantity, category, and total_value() method
  • Order dataclass with items list (default_factory), discount (hidden from repr), and total() method
  • Address frozen dataclass that can be used as dict keys
Output
Click Run to execute your code
Show Solution
from dataclasses import dataclass, field
from typing import List

@dataclass
class Product:
    sku: str
    name: str
    price: float
    quantity: int = 0
    category: str = "General"

    def total_value(self) -> float:
        return self.price * self.quantity


@dataclass
class Order:
    order_id: str
    customer: str
    items: List[Product] = field(default_factory=list)
    discount: float = field(default=0.0, repr=False)

    def add_item(self, product: Product):
        self.items.append(product)

    def subtotal(self) -> float:
        return sum(item.price for item in self.items)

    def total(self) -> float:
        return self.subtotal() - self.discount


@dataclass(frozen=True)
class Address:
    street: str
    city: str
    zip_code: str
    country: str = "USA"


# Test
p1 = Product("SKU001", "Laptop", 999.99, 2)
p2 = Product("SKU002", "Mouse", 29.99, 5)
p3 = Product("SKU003", "Keyboard", 79.99, 3, "Electronics")

print("Products:")
print(f"  {p1}")
print(f"  Total value: ${p1.total_value():.2f}")

order = Order("ORD-001", "Alice")
order.add_item(p1)
order.add_item(p2)
order.add_item(p3)

print(f"\nOrder: {order.order_id}")
print(f"  Items: {len(order.items)}")
print(f"  Subtotal: ${order.subtotal():.2f}")
order.discount = 100.00
print(f"  After $100 discount: ${order.total():.2f}")

addr1 = Address("123 Main St", "Boston", "02101")
addr2 = Address("123 Main St", "Boston", "02101")
print(f"\nAddress: {addr1}")
print(f"  addr1 == addr2: {addr1 == addr2}")
print(f"  Hashable: {hash(addr1) == hash(addr2)}")

Summary

  • @dataclass auto-generates __init__, __repr__, __eq__ from type-annotated fields
  • Use field(default_factory=list) for mutable defaults
  • field(repr=False, compare=False) controls which methods include a field
  • frozen=True creates immutable, hashable dataclasses
  • __post_init__ runs after init for validation or computed fields
  • InitVar[T] declares init-only variables not stored as attributes
  • order=True adds comparison methods (__lt__, __gt__, etc.)
  • asdict() and astuple() convert to dict/tuple
  • Python 3.10+: slots=True, kw_only=True for more control

What's Next?

You've completed the Object-Oriented Programming module! Next, explore Iterators & Generators to learn about lazy evaluation, the iterator protocol, yield, and memory-efficient data processing.