Advanced usage¶
Using low-level experiment API¶
While nyaggle provides run_experiment
as a high-level API,
Experiment
class can be used as a low-level API that provides primitive functionality for logging experiments.
It is useful when you want to track something other than CV, or need to implement your own CV logic.
from nyaggle.experiment import Experiment
with Experiment(logging_directory='./output/') as exp:
# log key-value pair as a parameter
exp.log_param('lr', 0.01)
exp.log_param('optimizer', 'adam')
# log text
exp.log('blah blah blah')
# log metric
exp.log_metric('CV', 0.85)
# log numpy ndarray
exp.log_numpy('predicted', predicted)
# log pandas dataframe
exp.log_dataframe('submission', sub, file_format='csv')
# log any file
exp.log_artifact('path-to-your-file')
# you can continue logging from existing result
with Experiment.continue_from('./output') as exp:
...
If you are familiar with mlflow tracking, you may notice that these APIs are similar to mlflow.
Experiment
can be treated as a thin wrapper if you pass mlflow=True
to the constructor.
from nyaggle.experiment import Experiment
with Experiment(logging_directory='./output/', with_mlflow=True) as exp:
# logging as you want, and you can see the result in mlflow ui
...
Logging extra parameters to run_experiment¶
By using inherit_experiment
parameter, you can mix any additional logging with the results run_experiment
will create.
In the following example, nyaggle records the result of run_experiment
under the same experiment as
the parameter and metrics written outside of the function.
from nyaggle.experiment import Experiment, run_experiment
with Experiment(logging_directory='./output/') as exp:
exp.log_param('my extra param', 'bar')
run_experiment(..., inherit_experiment=exp)
exp.log_metrics('my extra metrics', 0.999)
Tracking seed averaging experiment¶
If you train a bunch of models with different seeds to ensemble them, tracking individual models with mlflow will make GUI filled up with these results and make it difficult to manage. A nested run functionality of mlflow is useful to display multiple models together in one result.
import mlflow
from nyaggle.ensemble import averaging
from nyaggle.util import make_submission_df
mlflow.start_run()
base_logging_dir = './seed-avg/'
results = []
for i in range(3):
mlflow.start_run(nested=True) # use nested-run to place each experiments under the parent run
params['seed'] = i
result = run_experiment(params,
X_train,
y_train,
X_test,
logging_directory=base_logging_dir+f'seed_{i}',
with_mlflow=True)
results.append(result)
mlflow.end_run()
ensemble = averaging([result.test_prediction for result in results])
sub = make_submission_df(ensemble.test_prediction, pd.read_csv('sample_submission.csv'))
sub.to_csv('ensemble_sub.csv', index=False)