Basic Usage#
This guide covers the fundamental concepts and basic usage patterns of df-eval.
Creating an Engine#
The Engine is the main entry point for df-eval. Create one to start evaluating expressions:
from df_eval import Engine
engine = Engine()
Simple Expression Evaluation#
Evaluate a single expression on a DataFrame:
import pandas as pd
from df_eval import Engine
# Create a DataFrame
df = pd.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6]
})
# Create an engine
engine = Engine()
# Evaluate an expression
result = engine.evaluate(df, "a + b")
print(result) # [5, 7, 9]
The evaluate method returns a pandas Series with the result.
Schema-Driven Derived Columns#
Define multiple derived columns at once using a schema dictionary:
# Define a schema with multiple derived columns
schema = {
"sum": "a + b",
"product": "a * b",
"ratio": "a / b"
}
# Apply the schema to create new columns
df_with_derived = engine.apply_schema(df, schema)
The apply_schema method returns a new DataFrame with the derived columns
added. For a fuller, runnable walkthrough of schema-driven derived columns,
see the example Basic Engine Usage.
Using Built-in Functions#
df-eval provides several built-in safe functions that you can use in expressions:
Mathematical Functions#
schema = {
"abs_value": "abs(a - 5)",
"sqrt_value": "sqrt(b)",
"log_value": "log(b)",
"exp_value": "exp(a)"
}
result = engine.apply_schema(df, schema)
Clipping Values#
# Clip values to a range
schema = {
"clipped": "clip(a, 1, 2)" # Keep values between 1 and 2
}
result = engine.apply_schema(df, schema)
print(result["clipped"]) # [1, 2, 2]
Conditional Operations#
# Use where for conditional logic
schema = {
"category": "where(a > 2, 'high', 'low')"
}
result = engine.apply_schema(df, schema)
print(result["category"]) # ['low', 'low', 'high']
Handling Missing Values#
df_with_nulls = pd.DataFrame({
"a": [1, None, 3],
"b": [4, 5, None]
})
schema = {
"has_null": "isna(a)",
"filled": "fillna(b, 0)"
}
result = engine.apply_schema(df_with_nulls, schema)
Safe Division#
# Avoid division by zero errors
schema = {
"safe_ratio": "safe_divide(a, b)"
}
# Returns NaN for division by zero instead of raising an error
result = engine.apply_schema(df, schema)
Coalesce#
# Return first non-null value
df_multi = pd.DataFrame({
"a": [1, None, None],
"b": [None, 2, None],
"c": [None, None, 3]
})
schema = {
"first_valid": "coalesce(a, b, c)"
}
result = engine.apply_schema(df_multi, schema)
print(result["first_valid"]) # [1, 2, 3]
Batch Evaluation#
Evaluate multiple independent expressions at once:
expressions = {
"sum": "a + b",
"product": "a * b",
"avg": "(a + b) / 2"
}
# Evaluate all expressions
results = engine.evaluate_many(df, expressions)
# results is a dictionary mapping names to Series
for name, series in results.items():
print(f"{name}: {series.tolist()}")
Specifying Data Types#
Control the output type of derived columns using the dtypes argument to
df_eval.Engine.apply_schema():
schema = {
"float_sum": "a + b",
"int_product": "a * b"
}
result = engine.apply_schema(
df,
schema,
dtypes={"float_sum": "float64", "int_product": "int32"},
)
print(result.dtypes)
Error Handling#
df-eval validates expressions and provides clear error messages:
try:
# This will fail - column 'z' doesn't exist
result = engine.evaluate(df, "z + 1")
except Exception as e:
print(f"Error: {e}")
For a deeper tour of common error cases and debugging techniques, see the gallery example Error Handling and Debugging.
Next Steps#
Learn about Advanced Usage for dependency management and custom functions
Explore Lookups for integrating external data sources