nyaggle.util¶
- nyaggle.util.is_instance(obj, class_path_str)[source]¶
Acts as a safe version of isinstance without having to explicitly import packages which may not exist in the users environment. Checks if obj is an instance of type specified by class_path_str. :type obj: :param obj: Some object you want to test against :type obj: Any :type class_path_str:
Union
[str
,List
,Tuple
] :param class_path_str: A string or list of strings specifying full class pathsExample: sklearn.ensemble.RandomForestRegressor
- Returns
bool
- Return type
True if isinstance is true and the package exists, False otherwise
- nyaggle.util.make_submission_df(test_prediction, sample_submission=None, y=None)[source]¶
Make a dataframe formatted as a kaggle competition style.
- Parameters
test_prediction (
ndarray
) – A test prediction to be formatted.sample_submission (
Optional
[DataFrame
]) – A sample dataframe alined with test data (Usually in Kaggle, it is available as sample_submission.csv). The submission file will be created with the same schema as this dataframe.y (
Optional
[Series
]) – Target variables which is used for inferring the column name. Ignored ifsample_submission
is passed.
- Return type
DataFrame
- Returns
The formatted dataframe
- nyaggle.util.plot_importance(importance, path=None, top_n=100, figsize=None, title=None)[source]¶
Plot feature importance and write to image
- Parameters
importance (
DataFrame
) – The dataframe which has “feature” and “importance” columnpath (
Optional
[str
]) – The file path to be savedtop_n (
int
) – The number of features to be visualizedfigsize (
Optional
[Tuple
[int
,int
]]) – The size of the figuretitle (
Optional
[str
]) – The title of the plot
Example
>>> import pandas as pd >>> import lightgbm as lgb >>> from nyaggle.util import plot_importance >>> from sklearn.datasets import make_classification
>>> X, y = make_classification() >>> X = pd.DataFrame(X, columns=['col{}'.format(i) for i in range(X.shape[1])]) >>> booster = lgb.train({'objective': 'binary'}, lgb.Dataset(X, y)) >>> importance = pd.DataFrame({ >>> 'feature': X.columns, >>> 'importance': booster.feature_importance('gain') >>> }) >>> plot_importance(importance, 'importance.png')