Quick Start Guide

This page will describe the basic steps to use the package.

The package is designed to be used with pandera yaml schema files that have been modified to include the metadata key for each of the column entries.

A good way to create a yaml schema from a pandas dataframe is to use the pandera.infer_schema function.

You can add the following keys to the metadata key for each of your columns:

unit_of_measure
aliases
decimals
sentinel_values
category
calculation

import pandera-utils
processor = DataFrameMetaProcessor(schema)

Pre-process the dataframe to manage aliases, rounding and perform calculations.

processed_df = processor.preprocess(dataframe)

Finally, you can validate the dataframe using the schema.

processor.validate(processed_df)