Indexes¶
Indexes accelerate queries by avoiding full table scans. DictDB supports two index types.
Index Types¶
Hash Index¶
Hash indexes provide O(1) lookups for equality conditions:
employees.create_index("department", index_type="hash")
# This query uses the index - O(1) instead of O(n)
employees.select(where=Condition(employees.department == "IT"))
Best for:
- Equality comparisons (
==) is_in()conditions- High-cardinality fields (many unique values)
Sorted Index¶
Sorted indexes support both equality and range queries:
employees.create_index("salary", index_type="sorted")
# Equality - uses index
employees.select(where=Condition(employees.salary == 75000))
# Range queries - uses index
employees.select(where=Condition(employees.salary > 70000))
employees.select(where=Condition(employees.salary <= 80000))
employees.select(where=Condition(employees.salary >= 60000))
Best for:
- Range queries (
<,<=,>,>=) - Ordered data access
- Fields used in
order_by
Creating Indexes¶
# Hash index (default)
employees.create_index("department")
employees.create_index("department", index_type="hash")
# Sorted index
employees.create_index("salary", index_type="sorted")
Indexes are automatically updated when records are inserted, updated, or deleted.
Checking Indexes¶
# List indexed fields
employees.indexed_fields() # ["department", "salary"]
# Check if a field is indexed
employees.has_index("department") # True
employees.has_index("name") # False
Index Usage¶
DictDB automatically uses indexes when possible:
# Uses index (equality on indexed field)
employees.select(where=Condition(employees.department == "IT"))
# Uses index (range on sorted index)
employees.select(where=Condition(employees.salary > 70000))
# Uses index (is_in on indexed field)
employees.select(where=Condition(employees.department.is_in(["IT", "HR"])))
# Full table scan (no index on "name")
employees.select(where=Condition(employees.name == "Alice"))
Compound Conditions¶
For AND conditions, the index is used to narrow down candidates:
employees.create_index("department")
# Uses department index, then filters by salary
employees.select(
where=Condition((employees.department == "IT") & (employees.salary > 70000))
)
Performance Considerations¶
When to Use Indexes¶
- Fields frequently used in
whereconditions - Fields with high selectivity (few matches per value)
- Range query fields (use sorted index)
When to Avoid Indexes¶
- Tables with few records (< 100)
- Rarely queried fields
- Fields updated very frequently
Memory Overhead¶
Indexes consume additional memory:
- Hash index: ~O(n) for n records
- Sorted index: ~O(n) with tree structure overhead
Persistence¶
Indexes Are Not Persisted
Indexes are not saved when using db.save(). After loading a database, recreate indexes if needed:
# Save
db.save("data.json", file_format="json")
# Load
db = DictDB.load("data.json", file_format="json")
# Recreate indexes
employees = db.get_table("employees")
employees.create_index("department")
employees.create_index("salary", index_type="sorted")
Example¶
from dictdb import DictDB, Condition
db = DictDB()
db.create_table("orders", primary_key="order_id")
orders = db.get_table("orders")
# Create indexes
orders.create_index("customer_id") # Hash for equality lookups
orders.create_index("total", index_type="sorted") # Sorted for range queries
orders.create_index("status") # Hash for status filtering
# Insert data
for i in range(10000):
orders.insert({
"customer_id": i % 100,
"total": 10 + (i * 0.5),
"status": "completed" if i % 3 == 0 else "pending"
})
# Fast queries using indexes
customer_orders = orders.select(where=Condition(orders.customer_id == 42))
large_orders = orders.select(where=Condition(orders.total > 1000))
pending = orders.select(where=Condition(orders.status == "pending"))