Note
Go to the end to download the full example code.
Performance and Scalability Tips#
This example highlights a few simple patterns to keep df-eval pipelines efficient and scalable.
It demonstrates:
Reusing an
df_eval.EngineinstanceUsing
Engine.evaluate_many()instead of many single calls
import time
import pandas as pd
from df_eval import Engine
Build a Moderately Sized DataFrame#
Reuse a Single Engine Instance#
engine = Engine()
def time_many_single_calls() -> float:
start = time.perf_counter()
for _ in range(20):
engine.evaluate(df, "a + b")
return time.perf_counter() - start
def time_evaluate_many() -> float:
start = time.perf_counter()
engine.evaluate_many(
df,
{
"sum": "a + b",
"product": "a * b",
"avg": "(a + b) / 2",
},
)
return time.perf_counter() - start
single_time = time_many_single_calls()
batch_time = time_evaluate_many()
print("Time for many single evaluate calls: {:.4f}s".format(single_time))
print("Time for a single evaluate_many call: {:.4f}s".format(batch_time))
Time for many single evaluate calls: 0.0139s
Time for a single evaluate_many call: 0.0039s
Total running time of the script: (0 minutes 0.020 seconds)