parq_tools.lazy_parquet.LazyParquetDataFrame
- class parq_tools.lazy_parquet.LazyParquetDataFrame(path, index_cols=None)[source]
- __init__(path, index_cols=None)[source]
Initialize a LazyParquetDataFrame.
- Parameters:
path (str or Path) – Path to the Parquet file.
index_cols (list[str]) – List of column names, if any, to be used as index columns.
Methods
__init__(path[, index_cols])Initialize a LazyParquetDataFrame.
add_column(name, data[, position])Add a new column to the DataFrame.
assign(**kwargs)Assign new columns to the DataFrame.
drop([labels, axis, index, columns, level, ...])Drop specified labels from the DataFrame.
head([n])Return the first n rows of the DataFrame.
insert(loc, column, value[, allow_duplicates])Insert a new column at a specific location.
iter_chunks([batch_size, columns])Yield pandas DataFrames in row-wise chunks, including extra columns.
rename([mapper, index, columns, axis, copy, ...])Rename the columns or index of the DataFrame.
reset_index([drop])Reset the index of the DataFrame, optionally dropping it.
save([path, batch_size])Save the DataFrame to Parquet in chunks to reduce memory usage.
set_index(columns)Set the index of the DataFrame to the specified columns.
Convert the Parquet file to a pandas DataFrame, caching the result.
to_parquet(path)Write the DataFrame to a Parquet file.
Attributes
columnsdtypesindexlocshape- drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')[source]
Drop specified labels from the DataFrame.
- insert(loc, column, value, allow_duplicates=False)[source]
Insert a new column at a specific location.
- iter_chunks(batch_size=100000, columns=None)[source]
Yield pandas DataFrames in row-wise chunks, including extra columns.
- rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')[source]
Rename the columns or index of the DataFrame.