nyaggle.feature_store¶
- nyaggle.feature_store.cached_feature(feature_name, directory='./features/', ignore_columns=None)[source]¶
Decorator to wrap a function which returns pd.DataFrame with a memorizing callable that saves dataframe using
feature_store.save_feature
.- Parameters
feature_name (
Union
[int
,str
]) – The name of the feature (used insave_feature
).directory (
str
) – The directory where the feature is stored.ignore_columns (
Optional
[List
[str
]]) – The list of columns that will be dropped from the loaded dataframe.
Example
>>> from nyaggle.feature_store import cached_feature >>> >>> @cached_feature('x') >>> def make_feature_x(param) -> pd.DataFrame: >>> print('called') >>> ... >>> return df >>> >>> x = make_feature_x(...) # if x.f does not exist, call the function and save result to x.f "called" >>> x = make_feature_x(...) # load from file in the second time
- nyaggle.feature_store.load_feature(feature_name, directory='./features/', ignore_columns=None)[source]¶
Load feature as pandas DataFrame.
- Parameters
feature_name (
Union
[int
,str
]) – The name of the feature (used insave_feature
).directory (
str
) – The directory where the feature is stored.ignore_columns (
Optional
[List
[str
]]) – The list of columns that will be dropped from the loaded dataframe.
- Return type
DataFrame
- Returns
The feature dataframe
- nyaggle.feature_store.load_features(base_df, feature_names, directory='./features/', ignore_columns=None, create_directory=True, rename_duplicate=True)[source]¶
Load features and returns concatenated dataframe
- Parameters
base_df (
Optional
[DataFrame
]) – The base dataframe. If not None, resulting dataframe will consist of base and loaded feature columns.feature_names (
List
[Union
[int
,str
]]) – The list of feature names to be loaded.directory (
str
) – The directory where the feature is stored.ignore_columns (
Optional
[List
[str
]]) – The list of columns that will be dropped from the loaded dataframe.create_directory (
bool
) – If True, create directory if not exists.rename_duplicate (
bool
) – If True, duplicated column name will be renamed automatically (feature name will be used as suffix). If False, duplicated columns will be as-is.
- Return type
DataFrame
- Returns
The merged dataframe
- nyaggle.feature_store.save_feature(df, feature_name, directory='./features/', with_csv_dump=False, create_directory=True, reference_target_variable=None, overwrite=False)[source]¶
Save pandas dataframe as feather-format
- Parameters
df (
DataFrame
) – The dataframe to be saved.feature_name (
Union
[int
,str
]) – The name of the feature. The output file will be{feature_name}.f
.directory (
str
) – The directory where the feature will be stored.with_csv_dump (
bool
) – If True, the first 1000 lines are dumped to csv file for debug.create_directory (
bool
) – If True, create directory if not exists.reference_target_variable (
Optional
[Series
]) – If not None, instant validation will be made on the feature.overwrite (
bool
) – If False and file already exists, RuntimeError will be raised.