parq_tools.lazy_parquet.LazyParquetDataFrame

class parq_tools.lazy_parquet.LazyParquetDataFrame(path, index_cols=None)[source]
__init__(path, index_cols=None)[source]

Initialize a LazyParquetDataFrame.

Parameters:
  • path (str or Path) – Path to the Parquet file.

  • index_cols (list[str]) – List of column names, if any, to be used as index columns.

Methods

__init__(path[, index_cols])

Initialize a LazyParquetDataFrame.

add_column(name, data[, position])

Add a new column to the DataFrame.

assign(**kwargs)

Assign new columns to the DataFrame.

drop([labels, axis, index, columns, level, ...])

Drop specified labels from the DataFrame.

head([n])

Return the first n rows of the DataFrame.

insert(loc, column, value[, allow_duplicates])

Insert a new column at a specific location.

iter_chunks([batch_size, columns])

Yield pandas DataFrames in row-wise chunks, including extra columns.

rename([mapper, index, columns, axis, copy, ...])

Rename the columns or index of the DataFrame.

reset_index([drop])

Reset the index of the DataFrame, optionally dropping it.

save([path, batch_size])

Save the DataFrame to Parquet in chunks to reduce memory usage.

set_index(columns)

Set the index of the DataFrame to the specified columns.

to_pandas()

Convert the Parquet file to a pandas DataFrame, caching the result.

to_parquet(path)

Write the DataFrame to a Parquet file.

Attributes

columns

dtypes

index

loc

shape

add_column(name, data, position=None)[source]

Add a new column to the DataFrame.

assign(**kwargs)[source]

Assign new columns to the DataFrame.

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')[source]

Drop specified labels from the DataFrame.

head(n=5)[source]

Return the first n rows of the DataFrame.

insert(loc, column, value, allow_duplicates=False)[source]

Insert a new column at a specific location.

iter_chunks(batch_size=100000, columns=None)[source]

Yield pandas DataFrames in row-wise chunks, including extra columns.

rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')[source]

Rename the columns or index of the DataFrame.

reset_index(drop=False)[source]

Reset the index of the DataFrame, optionally dropping it.

save(path=None, batch_size=100000)[source]

Save the DataFrame to Parquet in chunks to reduce memory usage.

set_index(columns)[source]

Set the index of the DataFrame to the specified columns.

to_pandas()[source]

Convert the Parquet file to a pandas DataFrame, caching the result.

to_parquet(path)[source]

Write the DataFrame to a Parquet file.