Files
sqlalchemy-pgview/docs/guide/materialized-views.md
2026-02-08 10:09:48 +01:00

9.6 KiB

Materialized Views

Materialized views store the query results physically, making them faster to query but requiring explicit refresh to update.

When to Use Materialized Views

Use Case Regular View Materialized View
Complex aggregations queried frequently Slow Fast
Data that changes frequently Current Stale
Reports and dashboards Slow Fast
Real-time data requirements Yes No
Large dataset summaries Slow Fast

Creating Materialized Views (Declarative)

The recommended way to create materialized views is using the declarative pattern with multiple inheritance:

from decimal import Decimal
from sqlalchemy import select, func, String, Numeric, Integer, DateTime
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy_pgview import MaterializedViewBase

class Base(DeclarativeBase):
    pass

class Order(Base):
    __tablename__ = "orders"
    id: Mapped[int] = mapped_column(primary_key=True)
    total: Mapped[Decimal] = mapped_column(Numeric(10, 2))
    created_at: Mapped[DateTime] = mapped_column(DateTime)

# Define materialized view with data populated immediately
class MonthlySales(MaterializedViewBase, Base):
    __tablename__ = "monthly_sales"
    __select__ = select(
        func.date_trunc('month', Order.created_at).label("month"),
        func.count(Order.id).label("order_count"),
        func.sum(Order.total).label("revenue"),
    ).group_by(func.date_trunc('month', Order.created_at))
    __with_data__ = True  # Default: populate on creation

# Create tables and views together
engine = create_engine("postgresql://user:pass@localhost/mydb")
Base.metadata.create_all(engine)

MaterializedViewBase Attributes

Attribute Required Description
__tablename__ Yes Name of the materialized view in the database
__select__ Yes SELECT statement that defines the view
__schema__ No Database schema (default: None/public)
__with_data__ No Populate data on creation (default: True)

Querying Materialized Views

from sqlalchemy import select

with engine.connect() as conn:
    # Query all rows
    result = conn.execute(select(MonthlySales.as_table())).fetchall()

    # Query with filtering using .c accessor
    result = conn.execute(
        select(MonthlySales.as_table())
        .where(MonthlySales.c.order_count > 10)
        .order_by(MonthlySales.c.revenue.desc())
    ).fetchall()

    for row in result:
        print(f"{row.month}: {row.order_count} orders, ${row.revenue}")

Refreshing Materialized Views

Materialized views become stale when underlying data changes. Refresh to update:

with engine.begin() as conn:
    # Using the class method (declarative)
    MonthlySales.refresh(conn)

    # Or using the view object
    MonthlySales.as_view().refresh(conn)

    # Concurrent refresh (requires a unique index, doesn't block reads)
    MonthlySales.refresh(conn, concurrently=True)

    # Refresh without data (empties the view)
    MonthlySales.refresh(conn, with_data=False)

!!! warning "Concurrent Refresh Requirements" REFRESH MATERIALIZED VIEW CONCURRENTLY requires:

- A unique index on the materialized view
- The view must already contain data (can't be empty)

Stale Data Behavior

!!! warning "Materialized Views Don't Auto-Update" Unlike regular views, materialized views show stale data until refreshed.

with engine.begin() as conn:
    # Insert new data into underlying table
    conn.execute(insert(Order.__table__).values(...))

    # Materialized view still shows old data!
    # Refresh to see new data
    MonthlySales.refresh(conn)

    # Now the view reflects the changes
    result = conn.execute(select(MonthlySales.as_table())).fetchall()

Auto-Refresh (ORM)

SQLAlchemy-PGView can automatically refresh materialized views when watched tables are modified via ORM:

from sqlalchemy.orm import Session

# Enable auto-refresh when Order table changes
MonthlySales.auto_refresh_on(Session, Order.__table__)

# Now ORM commits automatically refresh the view
with Session(engine) as session:
    session.add(Order(total=Decimal("99.99"), created_at=datetime.now()))
    session.commit()  # MonthlySales is refreshed automatically!

!!! tip "Custom Session Class" For isolated testing or specific workflows, create a custom Session subclass:

```python
class AnalyticsSession(Session):
    pass

MonthlySales.auto_refresh_on(AnalyticsSession, Order.__table__)
```

Auto-Refresh (Core)

For SQLAlchemy Core (without ORM), use AutoRefreshContext:

from sqlalchemy_pgview import AutoRefreshContext

# Get the underlying view and table objects
mview = MonthlySales.as_view()
orders_table = Order.__table__

with engine.begin() as conn:
    with AutoRefreshContext(conn, mview, orders_table):
        conn.execute(insert(orders_table).values(total=100, created_at=datetime.now()))
        conn.execute(insert(orders_table).values(total=200, created_at=datetime.now()))
    # View is refreshed when exiting the context

    # View now shows updated data
    result = conn.execute(select(MonthlySales.as_table())).fetchone()

!!! warning "Auto-Refresh Considerations" - Performance: Each commit triggers a refresh. For high-frequency writes, use scheduled refresh instead. - Exceptions: AutoRefreshContext only refreshes on successful exit (no exception). - ORM-only: auto_refresh_on() only catches changes made through the ORM Session, not raw SQL.

Refresh Strategies

Data Freshness Requirement Strategy
Real-time (careful!) Auto-refresh on commit
Minutes Frequent scheduled refresh
Hours Hourly cron job
Daily Nightly refresh
On-demand Manual refresh before queries

Imperative API (Alternative)

For SQLAlchemy Core or when declarative style isn't suitable:

from sqlalchemy import MetaData, Table, Column, Integer, Numeric, DateTime, select, func
from sqlalchemy_pgview import MaterializedView, CreateMaterializedView, RefreshMaterializedView

metadata = MetaData()

orders = Table(
    "orders", metadata,
    Column("id", Integer, primary_key=True),
    Column("total", Numeric(10, 2)),
    Column("created_at", DateTime),
)

# Define materialized view
monthly_sales = MaterializedView(
    "monthly_sales",
    select(
        func.date_trunc('month', orders.c.created_at).label("month"),
        func.count(orders.c.id).label("order_count"),
        func.sum(orders.c.total).label("revenue"),
    ).group_by(func.date_trunc('month', orders.c.created_at)),
    metadata=metadata,
    with_data=True,
)

# Create tables and views
metadata.create_all(engine)

# Refresh
with engine.begin() as conn:
    monthly_sales.refresh(conn)
    # Or using DDL directly
    conn.execute(RefreshMaterializedView(monthly_sales, concurrently=True))

Dropping Materialized Views

Materialized views are automatically dropped when using metadata.drop_all(). For manual control:

from sqlalchemy_pgview import DropMaterializedView

with engine.begin() as conn:
    # Drop using view object
    conn.execute(DropMaterializedView(MonthlySales.as_view()))

    # Drop by name with options
    conn.execute(DropMaterializedView("old_mv", if_exists=True))

    # Drop with cascade (drops dependent objects)
    conn.execute(DropMaterializedView("base_mv", cascade=True))

List Registered Views

from sqlalchemy_pgview import get_views

# Get all registered views from metadata (includes materialized views)
views = get_views(Base.metadata)
print(views)  # {'monthly_sales': <MaterializedView 'monthly_sales'>}

Complete Example

from decimal import Decimal
from datetime import datetime
from sqlalchemy import create_engine, select, func, String, Numeric, Integer, DateTime
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session
from sqlalchemy_pgview import MaterializedViewBase

class Base(DeclarativeBase):
    pass

class Product(Base):
    __tablename__ = "products"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(100))
    category: Mapped[str] = mapped_column(String(50))

class Sale(Base):
    __tablename__ = "sales"
    id: Mapped[int] = mapped_column(primary_key=True)
    product_id: Mapped[int] = mapped_column(Integer)
    quantity: Mapped[int] = mapped_column(Integer)
    amount: Mapped[Decimal] = mapped_column(Numeric(10, 2))
    sold_at: Mapped[datetime] = mapped_column(DateTime)

class CategorySales(MaterializedViewBase, Base):
    __tablename__ = "category_sales"
    __select__ = select(
        Product.category,
        func.count(Sale.id).label("sale_count"),
        func.sum(Sale.quantity).label("total_quantity"),
        func.sum(Sale.amount).label("total_revenue"),
    ).select_from(
        Sale.__table__.join(Product.__table__, Sale.product_id == Product.id)
    ).group_by(Product.category)

# Setup
engine = create_engine("postgresql://...")
Base.metadata.create_all(engine)

# Enable auto-refresh
CategorySales.auto_refresh_on(Session, Sale.__table__)

# Add data - view refreshes automatically on commit
with Session(engine) as session:
    session.add(Sale(product_id=1, quantity=5, amount=Decimal("49.95"), sold_at=datetime.now()))
    session.commit()

# Query fresh data
with engine.connect() as conn:
    results = conn.execute(
        select(CategorySales.as_table()).order_by(CategorySales.c.total_revenue.desc())
    ).fetchall()

    for row in results:
        print(f"{row.category}: {row.sale_count} sales, ${row.total_revenue}")