nyaggle.util

nyaggle.util.is_instance(obj, class_path_str)[source]

Acts as a safe version of isinstance without having to explicitly import packages which may not exist in the users environment. Checks if obj is an instance of type specified by class_path_str. :type obj: :param obj: Some object you want to test against :type obj: Any :type class_path_str: Union[str, List, Tuple] :param class_path_str: A string or list of strings specifying full class paths

Example: sklearn.ensemble.RandomForestRegressor

Returns

bool

Return type

True if isinstance is true and the package exists, False otherwise

nyaggle.util.make_submission_df(test_prediction, sample_submission=None, y=None)[source]

Make a dataframe formatted as a kaggle competition style.

Parameters
  • test_prediction (ndarray) – A test prediction to be formatted.

  • sample_submission (Optional[DataFrame]) – A sample dataframe alined with test data (Usually in Kaggle, it is available as sample_submission.csv). The submission file will be created with the same schema as this dataframe.

  • y (Optional[Series]) – Target variables which is used for inferring the column name. Ignored if sample_submission is passed.

Return type

DataFrame

Returns

The formatted dataframe

nyaggle.util.plot_importance(importance, path=None, top_n=100, figsize=None, title=None)[source]

Plot feature importance and write to image

Parameters
  • importance (DataFrame) – The dataframe which has “feature” and “importance” column

  • path (Optional[str]) – The file path to be saved

  • top_n (int) – The number of features to be visualized

  • figsize (Optional[Tuple[int, int]]) – The size of the figure

  • title (Optional[str]) – The title of the plot

Example

>>> import pandas as pd
>>> import lightgbm as lgb
>>> from nyaggle.util import plot_importance
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification()
>>> X = pd.DataFrame(X, columns=['col{}'.format(i) for i in range(X.shape[1])])
>>> booster = lgb.train({'objective': 'binary'}, lgb.Dataset(X, y))
>>> importance = pd.DataFrame({
>>>     'feature': X.columns,
>>>     'importance': booster.feature_importance('gain')
>>> })
>>> plot_importance(importance, 'importance.png')