nyaggle.feature_store

nyaggle.feature_store.cached_feature(feature_name, directory='./features/', ignore_columns=None)[source]

Decorator to wrap a function which returns pd.DataFrame with a memorizing callable that saves dataframe using feature_store.save_feature.

Parameters
  • feature_name (Union[int, str]) – The name of the feature (used in save_feature).

  • directory (str) – The directory where the feature is stored.

  • ignore_columns (Optional[List[str]]) – The list of columns that will be dropped from the loaded dataframe.

Example

>>> from nyaggle.feature_store import cached_feature
>>>
>>> @cached_feature('x')
>>> def make_feature_x(param) -> pd.DataFrame:
>>>     print('called')
>>>     ...
>>>     return df
>>>
>>> x = make_feature_x(...)  # if x.f does not exist, call the function and save result to x.f
"called"
>>> x = make_feature_x(...)  # load from file in the second time
nyaggle.feature_store.load_feature(feature_name, directory='./features/', ignore_columns=None)[source]

Load feature as pandas DataFrame.

Parameters
  • feature_name (Union[int, str]) – The name of the feature (used in save_feature).

  • directory (str) – The directory where the feature is stored.

  • ignore_columns (Optional[List[str]]) – The list of columns that will be dropped from the loaded dataframe.

Return type

DataFrame

Returns

The feature dataframe

nyaggle.feature_store.load_features(base_df, feature_names, directory='./features/', ignore_columns=None, create_directory=True, rename_duplicate=True)[source]

Load features and returns concatenated dataframe

Parameters
  • base_df (Optional[DataFrame]) – The base dataframe. If not None, resulting dataframe will consist of base and loaded feature columns.

  • feature_names (List[Union[int, str]]) – The list of feature names to be loaded.

  • directory (str) – The directory where the feature is stored.

  • ignore_columns (Optional[List[str]]) – The list of columns that will be dropped from the loaded dataframe.

  • create_directory (bool) – If True, create directory if not exists.

  • rename_duplicate (bool) – If True, duplicated column name will be renamed automatically (feature name will be used as suffix). If False, duplicated columns will be as-is.

Return type

DataFrame

Returns

The merged dataframe

nyaggle.feature_store.save_feature(df, feature_name, directory='./features/', with_csv_dump=False, create_directory=True, reference_target_variable=None, overwrite=False)[source]

Save pandas dataframe as feather-format

Parameters
  • df (DataFrame) – The dataframe to be saved.

  • feature_name (Union[int, str]) – The name of the feature. The output file will be {feature_name}.f.

  • directory (str) – The directory where the feature will be stored.

  • with_csv_dump (bool) – If True, the first 1000 lines are dumped to csv file for debug.

  • create_directory (bool) – If True, create directory if not exists.

  • reference_target_variable (Optional[Series]) – If not None, instant validation will be made on the feature.

  • overwrite (bool) – If False and file already exists, RuntimeError will be raised.