nyaggle.ensemble¶

nyaggle.ensemble.averaging(test_predictions, oof_predictions=None, y=None, weights=None, eval_func=None, rank_averaging=False)[source]¶

Perform averaging on model predictions.

Parameters

test_predictions (List[ndarray]) – List of predicted values on test data.
oof_predictions (Optional[List[ndarray]]) – List of predicted values on out-of-fold training data.
y (Optional[Series]) – Target value
weights (Optional[List[float]]) – Weights for each predictions
eval_func (Optional[Callable]) – Evaluation metric used for calculating result score. Used only if oof_predictions and y are given.
rank_averaging (bool) – If True, predictions will be converted to rank before averaging.

Return type

EnsembleResult

Returns

Namedtuple with following members

test_prediction:
numpy array, Average prediction on test data.
oof_prediction:
numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.
score:
float, Calculated score on Out-of-Fold data. None if eval_func is None.

nyaggle.ensemble.averaging_opt(test_predictions, oof_predictions, y, eval_func, higher_is_better, weight_bounds=(0.0, 1.0), rank_averaging=False, method=None)[source]¶

Perform averaging with optimal weights using scipy.optimize.

Parameters

test_predictions (List[ndarray]) – List of predicted values on test data.
oof_predictions (Optional[List[ndarray]]) – List of predicted values on out-of-fold training data.
y (Optional[Series]) – Target value
eval_func (Optional[Callable[[ndarray, ndarray], float]]) – Evaluation metric f(y_true, y_pred) used for calculating result score. Used only if oof_predictions and y are given.
higher_is_better (bool) – Determine the direction of optimize eval_func.
weight_bounds (Tuple[float, float]) – Specify lower/upper bounds of each weight.
rank_averaging (bool) – If True, predictions will be converted to rank before averaging.
method (Optional[str]) – Type of solver. If None, SLSQP will be used.

Return type

EnsembleResult

Returns

Namedtuple with following members

test_prediction:
numpy array, Average prediction on test data.
oof_prediction:
numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.
score:
float, Calculated score on Out-of-Fold data. None if eval_func is None.

nyaggle.ensemble.stacking(test_predictions, oof_predictions, y, estimator=None, cv=None, groups=None, type_of_target='auto', eval_func=None)[source]¶

Perform stacking on predictions.

Parameters

test_predictions (List[ndarray]) – List of predicted values on test data.
oof_predictions (List[ndarray]) – List of predicted values on out-of-fold training data.
y (Series) – Target value
estimator (Optional[BaseEstimator]) – Estimator used for the 2nd-level model. If None, the default estimator (auto-tuned linear model) will be used.
cv (Union[int, Iterable, BaseCrossValidator, None]) –
int, cross-validation generator or an iterable which determines the cross-validation splitting strategy.
- None, to use the default KFold(5, random_state=0, shuffle=True),
- integer, to specify the number of folds in a (Stratified)KFold,
- CV splitter (the instance of BaseCrossValidator),
- An iterable yielding (train, test) splits as arrays of indices.
groups (Optional[Series]) – Group labels for the samples. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold).
type_of_target (str) – The type of target variable. If auto, type is inferred by sklearn.utils.multiclass.type_of_target. Otherwise, binary, continuous, or multiclass are supported.
eval_func (Optional[Callable]) – Evaluation metric used for calculating result score. Used only if oof_predictions and y are given.

Return type

EnsembleResult

Returns

Namedtuple with following members

test_prediction:
numpy array, Average prediction on test data.
oof_prediction:
numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.
score:
float, Calculated score on Out-of-Fold data. None if eval_func is None.