elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor
- class elphick.pandera_utils.pandera_utils.DataFrameMetaProcessor(schema)[source]
A class to preprocess and validate DataFrames based on metadata.
- __init__(schema)[source]
Instantiate the DataFrameMetaProcessor object.
- Parameters:
schema (
DataFrameSchema
) – The DataFrameSchema object to use for preprocessing and validation.
Methods
__init__
(schema)Instantiate the DataFrameMetaProcessor object.
Apply calculations based on the calculation metadata.
apply_category_maps
(df[, maps_to_apply, ...])Apply category maps to create new columns based on the category metadata.
Apply missing sentinels based on the missing_sentinels metadata.
Rename columns in the DataFrame based on aliases.
apply_rounding
(df[, columns])Round columns based on the decimals metadata.
Check if the schema is valid.
preprocess
(df[, round_before_calc, ...])Preprocess a DataFrame based on the metadata.
validate
(df[, return_calculated_columns])Validate a DataFrame based on the schema.
Attributes
alias_map
calculation_map
category_maps
category_ordered_map
decimals_map
missing_sentinels_map
unit_of_measure_map
- class OrderedDict
Dictionary that remembers insertion order
- clear() None. Remove all items from od.
- copy() a shallow copy of od
- fromkeys(value=None)
Create a new ordered dictionary with keys from iterable and values set to value.
- items() a set-like object providing a view on D's items
- keys() a set-like object providing a view on D's keys
- move_to_end(key, last=True)
Move an existing element to the end (or beginning if last is false).
Raise KeyError if the element does not exist.
- pop(key[, default]) v, remove specified key and return the corresponding value.
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem(last=True)
Remove and return a (key, value) pair from the dictionary.
Pairs are returned in LIFO order if last is true or FIFO order if false.
- setdefault(key, default=None)
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from mapping/iterable E and F.
If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values
- apply_calculations(df)[source]
Apply calculations based on the calculation metadata.
- Return type:
DataFrame
- apply_category_maps(df, maps_to_apply=None, retain_orig_cat_col=True)[source]
Apply category maps to create new columns based on the category metadata.
- Return type:
DataFrame
- apply_missing_sentinels(df)[source]
Apply missing sentinels based on the missing_sentinels metadata.
- Return type:
DataFrame
- apply_rename_from_alias(df)[source]
Rename columns in the DataFrame based on aliases.
- Return type:
DataFrame
- apply_rounding(df, columns=None)[source]
Round columns based on the decimals metadata.
- Return type:
DataFrame
- preprocess(df, round_before_calc=False, cat_maps_to_apply=None, cat_retain_orig_cat_col=True)[source]
Preprocess a DataFrame based on the metadata.
- Parameters:
df (
DataFrame
) – The DataFrame to preprocess.round_before_calc (
bool
) – A boolean indicating whether to round columns before applying calculations,after. (as well as)
cat_maps_to_apply (
Optional
[list
[str
]]) – A list of category maps to apply. If None, all maps will be applied.cat_retain_orig_cat_col (
bool
) – A boolean indicating whether to retain the original category columns.
- Return type:
DataFrame