nyaggle.ensemble

nyaggle.ensemble.averaging(test_predictions, oof_predictions=None, y=None, weights=None, eval_func=None, rank_averaging=False)[source]

Perform averaging on model predictions.

Parameters
  • test_predictions (List[ndarray]) – List of predicted values on test data.

  • oof_predictions (Optional[List[ndarray]]) – List of predicted values on out-of-fold training data.

  • y (Optional[Series]) – Target value

  • weights (Optional[List[float]]) – Weights for each predictions

  • eval_func (Optional[Callable]) – Evaluation metric used for calculating result score. Used only if oof_predictions and y are given.

  • rank_averaging (bool) – If True, predictions will be converted to rank before averaging.

Return type

EnsembleResult

Returns

Namedtuple with following members

  • test_prediction:

    numpy array, Average prediction on test data.

  • oof_prediction:

    numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.

  • score:

    float, Calculated score on Out-of-Fold data. None if eval_func is None.

nyaggle.ensemble.averaging_opt(test_predictions, oof_predictions, y, eval_func, higher_is_better, weight_bounds=(0, 1), rank_averaging=False)[source]

Perform averaging with optimal weights using scipy.optimize

Parameters
  • test_predictions (List[ndarray]) – List of predicted values on test data.

  • oof_predictions (Optional[List[ndarray]]) – List of predicted values on out-of-fold training data.

  • y (Optional[Series]) – Target value

  • eval_func (Optional[Callable]) – Evaluation metric used for calculating result score. Used only if oof_predictions and y are given.

  • higher_is_better (bool) – Determine the direction of optimize eval_func.

  • weight_bounds (Tuple) – Specify lower/upper bounds of each weight.

  • rank_averaging (bool) – If True, predictions will be converted to rank before averaging.

Return type

EnsembleResult

Returns

Namedtuple with following members

  • test_prediction:

    numpy array, Average prediction on test data.

  • oof_prediction:

    numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.

  • score:

    float, Calculated score on Out-of-Fold data. None if eval_func is None.

nyaggle.ensemble.stacking(test_predictions, oof_predictions, y, estimator=None, cv=None, groups=None, type_of_target='auto', eval_func=None)[source]

Perform stacking on predictions.

Parameters
  • test_predictions (List[ndarray]) – List of predicted values on test data.

  • oof_predictions (List[ndarray]) – List of predicted values on out-of-fold training data.

  • y (Series) – Target value

  • estimator (Optional[BaseEstimator]) – Estimator used for the 2nd-level model. If None, the default estimator (auto-tuned linear model) will be used.

  • cv (Union[int, Iterable, BaseCrossValidator, None]) –

    int, cross-validation generator or an iterable which determines the cross-validation splitting strategy.

    • None, to use the default KFold(5, random_state=0, shuffle=True),

    • integer, to specify the number of folds in a (Stratified)KFold,

    • CV splitter (the instance of BaseCrossValidator),

    • An iterable yielding (train, test) splits as arrays of indices.

  • groups (Optional[Series]) – Group labels for the samples. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold).

  • type_of_target (str) – The type of target variable. If auto, type is inferred by sklearn.utils.multiclass.type_of_target. Otherwise, binary, continuous, or multiclass are supported.

  • eval_func (Optional[Callable]) – Evaluation metric used for calculating result score. Used only if oof_predictions and y are given.

Return type

EnsembleResult

Returns

Namedtuple with following members

  • test_prediction:

    numpy array, Average prediction on test data.

  • oof_prediction:

    numpy array, Average prediction on Out-of-Fold validation data. None if oof_predictions = None.

  • score:

    float, Calculated score on Out-of-Fold data. None if eval_func is None.