Indexes¶

Indexes accelerate queries by avoiding full table scans. DictDB supports two index types.

Index Types¶

Hash Index¶

Hash indexes provide O(1) lookups for equality conditions:

employees.create_index("department", index_type="hash")

# This query uses the index - O(1) instead of O(n)
employees.select(where=Condition(employees.department == "IT"))

Best for:

Equality comparisons (==)
is_in() conditions
High-cardinality fields (many unique values)

Sorted Index¶

Sorted indexes support both equality and range queries:

employees.create_index("salary", index_type="sorted")

# Equality - uses index
employees.select(where=Condition(employees.salary == 75000))

# Range queries - uses index
employees.select(where=Condition(employees.salary > 70000))
employees.select(where=Condition(employees.salary <= 80000))
employees.select(where=Condition(employees.salary >= 60000))

Best for:

Range queries (<, <=, >, >=)
Ordered data access
Fields used in order_by

Creating Indexes¶

# Hash index (default)
employees.create_index("department")
employees.create_index("department", index_type="hash")

# Sorted index
employees.create_index("salary", index_type="sorted")

Indexes are automatically updated when records are inserted, updated, or deleted.

Checking Indexes¶

# List indexed fields
employees.indexed_fields()  # ["department", "salary"]

# Check if a field is indexed
employees.has_index("department")  # True
employees.has_index("name")  # False

Index Usage¶

DictDB automatically uses indexes when possible:

# Uses index (equality on indexed field)
employees.select(where=Condition(employees.department == "IT"))

# Uses index (range on sorted index)
employees.select(where=Condition(employees.salary > 70000))

# Uses index (is_in on indexed field)
employees.select(where=Condition(employees.department.is_in(["IT", "HR"])))

# Full table scan (no index on "name")
employees.select(where=Condition(employees.name == "Alice"))

Compound Conditions¶

For AND conditions, the index is used to narrow down candidates:

employees.create_index("department")

# Uses department index, then filters by salary
employees.select(
    where=Condition((employees.department == "IT") & (employees.salary > 70000))
)

Performance Considerations¶

When to Use Indexes¶

Fields frequently used in where conditions
Fields with high selectivity (few matches per value)
Range query fields (use sorted index)

When to Avoid Indexes¶

Tables with few records (< 100)
Rarely queried fields
Fields updated very frequently

Memory Overhead¶

Indexes consume additional memory:

Hash index: ~O(n) for n records
Sorted index: ~O(n) with tree structure overhead

Persistence¶

Indexes Are Not Persisted

Indexes are not saved when using db.save(). After loading a database, recreate indexes if needed:

# Save
db.save("data.json", file_format="json")

# Load
db = DictDB.load("data.json", file_format="json")

# Recreate indexes
employees = db.get_table("employees")
employees.create_index("department")
employees.create_index("salary", index_type="sorted")

Example¶

from dictdb import DictDB, Condition

db = DictDB()
db.create_table("orders", primary_key="order_id")
orders = db.get_table("orders")

# Create indexes
orders.create_index("customer_id")  # Hash for equality lookups
orders.create_index("total", index_type="sorted")  # Sorted for range queries
orders.create_index("status")  # Hash for status filtering

# Insert data
for i in range(10000):
    orders.insert({
        "customer_id": i % 100,
        "total": 10 + (i * 0.5),
        "status": "completed" if i % 3 == 0 else "pending"
    })

# Fast queries using indexes
customer_orders = orders.select(where=Condition(orders.customer_id == 42))
large_orders = orders.select(where=Condition(orders.total > 1000))
pending = orders.select(where=Condition(orders.status == "pending"))