Performance and Scalability Tips

Contents

Note

Go to the end to download the full example code.

Performance and Scalability Tips#

This example highlights a few simple patterns to keep df-eval pipelines efficient and scalable.

It demonstrates:

Reusing an df_eval.Engine instance
Using Engine.evaluate_many() instead of many single calls

import time

import pandas as pd

from df_eval import Engine

Build a Moderately Sized DataFrame#

n = 50_000
df = pd.DataFrame({"a": range(n), "b": range(n, 2 * n)})
df.head()

	a	b
0	0	50000
1	1	50001
2	2	50002
3	3	50003
4	4	50004

Reuse a Single Engine Instance#

engine = Engine()


def time_many_single_calls() -> float:
    start = time.perf_counter()
    for _ in range(20):
        engine.evaluate(df, "a + b")
    return time.perf_counter() - start


def time_evaluate_many() -> float:
    start = time.perf_counter()
    engine.evaluate_many(
        df,
        {
            "sum": "a + b",
            "product": "a * b",
            "avg": "(a + b) / 2",
        },
    )
    return time.perf_counter() - start


single_time = time_many_single_calls()
batch_time = time_evaluate_many()

print("Time for many single evaluate calls: {:.4f}s".format(single_time))
print("Time for a single evaluate_many call: {:.4f}s".format(batch_time))

Time for many single evaluate calls: 0.0167s
Time for a single evaluate_many call: 0.0047s

Total running time of the script: (0 minutes 0.039 seconds)

Gallery generated by Sphinx-Gallery