Performance and Scalability Tips

Performance and Scalability Tips#

This example highlights a few simple patterns to keep df-eval pipelines efficient and scalable.

It demonstrates:

  • Reusing an df_eval.Engine instance

  • Using Engine.evaluate_many() instead of many single calls

import time

import pandas as pd

from df_eval import Engine

Build a Moderately Sized DataFrame#

n = 50_000
df = pd.DataFrame({"a": range(n), "b": range(n, 2 * n)})
df.head()
a b
0 0 50000
1 1 50001
2 2 50002
3 3 50003
4 4 50004


Reuse a Single Engine Instance#

engine = Engine()


def time_many_single_calls() -> float:
    start = time.perf_counter()
    for _ in range(20):
        engine.evaluate(df, "a + b")
    return time.perf_counter() - start


def time_evaluate_many() -> float:
    start = time.perf_counter()
    engine.evaluate_many(
        df,
        {
            "sum": "a + b",
            "product": "a * b",
            "avg": "(a + b) / 2",
        },
    )
    return time.perf_counter() - start


single_time = time_many_single_calls()
batch_time = time_evaluate_many()

print("Time for many single evaluate calls: {:.4f}s".format(single_time))
print("Time for a single evaluate_many call: {:.4f}s".format(batch_time))
Time for many single evaluate calls: 0.0139s
Time for a single evaluate_many call: 0.0039s

Total running time of the script: (0 minutes 0.020 seconds)

Gallery generated by Sphinx-Gallery