elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor

class elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor(schema)[source]

A class to preprocess and validate DataFrames based on metadata.

__init__(schema)[source]

Instantiate the DataFrameMetaProcessor object.

Parameters:

schema (DataFrameSchema) – The DataFrameSchema object to use for preprocessing and validation.

Methods

__init__(schema)

Instantiate the DataFrameMetaProcessor object.

apply_calculations(df)

Apply calculations based on the calculation metadata.

apply_category_maps(df[, maps_to_apply, ...])

Apply category maps to create new columns based on the category metadata.

apply_missing_sentinels(df)

Apply missing sentinels based on the missing_sentinels metadata.

apply_rename_from_alias(df)

Rename columns in the DataFrame based on aliases.

apply_rounding(df[, columns])

Round columns based on the decimals metadata.

check_schema()

Check if the schema is valid.

preprocess(df[, round_before_calc, ...])

Preprocess a DataFrame based on the metadata.

validate(df[, return_calculated_columns])

Validate a DataFrame based on the schema.

Attributes

alias_map

calculation_map

category_maps

category_ordered_map

decimals_map

missing_sentinels_map

unit_of_measure_map

class OrderedDict

Dictionary that remembers insertion order

clear() None.  Remove all items from od.
copy() a shallow copy of od
fromkeys(value=None)

Create a new ordered dictionary with keys from iterable and values set to value.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(key[, default]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
apply_calculations(df)[source]

Apply calculations based on the calculation metadata.

Return type:

DataFrame

apply_category_maps(df, maps_to_apply=None, retain_orig_cat_col=True)[source]

Apply category maps to create new columns based on the category metadata.

Return type:

DataFrame

apply_missing_sentinels(df)[source]

Apply missing sentinels based on the missing_sentinels metadata.

Return type:

DataFrame

apply_rename_from_alias(df)[source]

Rename columns in the DataFrame based on aliases.

Return type:

DataFrame

apply_rounding(df, columns=None)[source]

Round columns based on the decimals metadata.

Return type:

DataFrame

check_schema()[source]

Check if the schema is valid.

preprocess(df, round_before_calc=False, cat_maps_to_apply=None, cat_retain_orig_cat_col=True)[source]

Preprocess a DataFrame based on the metadata.

Parameters:
  • df (DataFrame) – The DataFrame to preprocess.

  • round_before_calc (bool) – A boolean indicating whether to round columns before applying calculations,

  • after. (as well as)

  • cat_maps_to_apply (Optional[list[str]]) – A list of category maps to apply. If None, all maps will be applied.

  • cat_retain_orig_cat_col (bool) – A boolean indicating whether to retain the original category columns.

Return type:

DataFrame

validate(df, return_calculated_columns=True)[source]

Validate a DataFrame based on the schema.

Return type:

DataFrame