elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor

class elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor(schema)[source]

A class to preprocess and validate DataFrames based on metadata.

__init__(schema)[source]

Instantiate the DataFrameMetaProcessor object.

Parameters:: schema (DataFrameSchema) – The DataFrameSchema object to use for preprocessing and validation.

Methods

`__init__`(schema)	Instantiate the DataFrameMetaProcessor object.
`apply_calculations`(df)	Apply calculations based on the calculation metadata.
`apply_category_maps`(df[, maps_to_apply, ...])	Apply category maps to create new columns based on the category metadata.
`apply_missing_sentinels`(df)	Apply missing sentinels based on the missing_sentinels metadata.
`apply_rename_from_alias`(df)	Rename columns in the DataFrame based on aliases.
`apply_rounding`(df[, columns])	Round columns based on the decimals metadata.
`check_schema`()	Check if the schema is valid.
`preprocess`(df[, round_before_calc, ...])	Preprocess a DataFrame based on the metadata.
`validate`(df[, return_calculated_columns])	Validate a DataFrame based on the schema.

Attributes

`alias_map`
`calculation_map`
`category_maps`
`category_ordered_map`
`decimals_map`
`missing_sentinels_map`
`unit_of_measure_map`

class OrderedDict

Dictionary that remembers insertion order

clear() → None. Remove all items from od.

copy() → a shallow copy of od

fromkeys(value=None): Create a new ordered dictionary with keys from iterable and values set to value.

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(key[, default]) → v, remove specified key and return the corresponding value.: If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from mapping/iterable E and F.: If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values

apply_calculations(df)[source]

Apply calculations based on the calculation metadata.

Return type:: DataFrame

apply_category_maps(df, maps_to_apply=None, retain_orig_cat_col=True)[source]

Apply category maps to create new columns based on the category metadata.

Return type:: DataFrame

apply_missing_sentinels(df)[source]

Apply missing sentinels based on the missing_sentinels metadata.

Return type:: DataFrame

apply_rename_from_alias(df)[source]

Rename columns in the DataFrame based on aliases.

Return type:: DataFrame

apply_rounding(df, columns=None)[source]

Round columns based on the decimals metadata.

Return type:: DataFrame

check_schema()[source]: Check if the schema is valid.

preprocess(df, round_before_calc=False, cat_maps_to_apply=None, cat_retain_orig_cat_col=True)[source]

Preprocess a DataFrame based on the metadata.

Parameters:

df (DataFrame) – The DataFrame to preprocess.
round_before_calc (bool) – A boolean indicating whether to round columns before applying calculations,
after. (as well as)
cat_maps_to_apply (Optional[list[str]]) – A list of category maps to apply. If None, all maps will be applied.
cat_retain_orig_cat_col (bool) – A boolean indicating whether to retain the original category columns.

Return type:

DataFrame

validate(df, return_calculated_columns=True)[source]

Validate a DataFrame based on the schema.

Return type:: DataFrame