Superpowers
Your Superpowers with Hydra
Override any config parameter from command line
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer.max_epochs=20 model.optimizer.lr=1e-4
> **Note**: You can also add new parameters with `+` sign.
uv run python -m {{cookiecutter.project_slug}}.scripts.train +model.new_param="owo"
Train on CPU, GPU, multi-GPU and TPU
# train on CPU
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=cpu
# train on 1 GPU
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=gpu
# train on TPU
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.tpu_cores=8
# train with DDP (Distributed Data Parallel) (4 GPUs)
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=ddp trainer.devices=4
# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=ddp trainer.devices=4 trainer.num_nodes=2
# simulate DDP on CPU processes
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=ddp_sim trainer.devices=2
# accelerate training on mac
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=mps
> **Warning**: Currently there are problems with DDP mode, read [this issue](https://github.com/ashleve/lightning-hydra-template/issues/393) to learn more.
Train with mixed precision
# train with pytorch native automatic mixed precision (AMP)
uv run python -m {{cookiecutter.project_slug}}.scripts.train trainer=gpu +trainer.precision=16
Train model with any logger available in PyTorch Lightning, like W&B or Tensorboard
# set project and entity names in `configs/logger/wandb`
wandb:
project: "your_project_name"
entity: "your_wandb_team_name"
# train model with Weights&Biases (link to wandb dashboard should appear in the terminal)
uv run python -m {{cookiecutter.project_slug}}.scripts.train logger=wandb
> **Note**: Lightning provides convenient integrations with most popular logging frameworks. Learn more [here](#experiment-tracking).
> **Note**: Using wandb requires you to [setup account](https://www.wandb.com/) first. After that just complete the config as below.
> **Note**: Click [here](https://wandb.ai/hobglob/template-dashboard/) to see example wandb dashboard generated with this template.
Train model with chosen experiment config
uv run python -m {{cookiecutter.project_slug}}.scripts.train experiment=example
> **Note**: Experiment configs are placed in [configs/experiment/]({{cookiecutter.project_name}}/{{cookiecutter.project_slug}}/experiment/).
Attach some callbacks to run
uv run python -m {{cookiecutter.project_slug}}.scripts.train callbacks=default
> **Note**: Callbacks can be used for things such as as model checkpointing, early stopping and [many more](https://pytorch-lightning.readthedocs.io/en/latest/extensions/callbacks.html#built-in-callbacks).
> **Note**: Callbacks configs are placed in [configs/callbacks/]({{cookiecutter.project_name}}/{{cookiecutter.project_slug}}/callbacks/).
Use different tricks available in Pytorch Lightning
# gradient clipping may be enabled to avoid exploding gradients
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.gradient_clip_val=0.5
# run validation loop 4 times during a training epoch
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.val_check_interval=0.25
# accumulate gradients
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.accumulate_grad_batches=10
# terminate training after 12 hours
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.max_time="00:12:00:00"
> **Note**: PyTorch Lightning provides about [40+ useful trainer flags](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-flags).
Easily debug
# runs 1 epoch in default debugging mode
# changes logging directory to `logs/debugs/...`
# sets level of all command line loggers to 'DEBUG'
# enforces debug-friendly configuration
uv run python -m {{cookiecutter.project_slug}}.scripts.train debug=default
# run 1 train, val and test loop, using only 1 batch
uv run python -m {{cookiecutter.project_slug}}.scripts.train debug=fdr
# print execution time profiling
uv run python -m {{cookiecutter.project_slug}}.scripts.train debug=profiler
# try overfitting to 1 batch
uv run python -m {{cookiecutter.project_slug}}.scripts.train debug=overfit
# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.detect_anomaly=true
# use only 20% of the data
uv run python -m {{cookiecutter.project_slug}}.scripts.train +trainer.limit_train_batches=0.2 \
+trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2
> **Note**: Visit [configs/debug/]({{cookiecutter.project_name}}/{{cookiecutter.project_slug}}/debug/) for different debugging configs.
Resume training from checkpoint
uv run python -m {{cookiecutter.project_slug}}.scripts.train ckpt_path="/path/to/ckpt/name.ckpt"
> **Note**: Checkpoint can be either path or URL.
> **Note**: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.
Evaluate checkpoint on test dataset
uv run python -m {{cookiecutter.project_slug}}.scripts.eval ckpt_path="/path/to/ckpt/name.ckpt"
> **Note**: Checkpoint can be either path or URL.
Create a sweep over hyperparameters
# this will run 6 experiments one after the other,
# each with different combination of batch_size and learning rate
uv run python -m {{cookiecutter.project_slug}}.scripts.train -m data.batch_size=32,64,128 model.lr=0.001,0.0005
> **Note**: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.
Create a sweep over hyperparameters with Optuna
# this will run hyperparameter search defined in `configs/hparams_search/mnist_optuna.yaml`
# over chosen experiment config
uv run python -m {{cookiecutter.project_slug}}.scripts.train -m hparams_search=mnist_optuna experiment=example
> **Note**: Using [Optuna Sweeper](https://hydra.cc/docs/next/plugins/optuna_sweeper) doesn't require you to add any boilerplate to your code, everything is defined in a [single config file]({{cookiecutter.project_name}}/{{cookiecutter.project_slug}}/hparams_search/mnist_optuna.yaml).
> **Warning**: Optuna sweeps are not failure-resistant (if one job crashes then the whole sweep crashes).
Execute all experiments from folder
uv run python -m {{cookiecutter.project_slug}}.scripts.train -m 'experiment=glob(*)'
> **Note**: Hydra provides special syntax for controlling behavior of multiruns. Learn more [here](https://hydra.cc/docs/next/tutorials/basic/running_your_app/multi-run). The command above executes all experiments from [configs/experiment/]({{cookiecutter.project_name}}/{{cookiecutter.project_slug}}/experiment/).
Execute run for multiple different seeds
uv run python -m {{cookiecutter.project_slug}}.scripts.train -m seed=1,2,3,4,5 trainer.deterministic=True logger=csv tags=["benchmark"]
> **Note**: `trainer.deterministic=True` makes pytorch more deterministic but impacts the performance.
Execute sweep on a remote AWS cluster
> **Note**: This should be achievable with simple config using [Ray AWS launcher for Hydra](https://hydra.cc/docs/next/plugins/ray_launcher). Example is not implemented in this template.
Use Hydra tab completion
> **Note**: Hydra allows you to autocomplete config argument overrides in shell as you write them, by pressing `tab` key. Read the [docs](https://hydra.cc/docs/tutorials/basic/running_your_app/tab_completion).
Apply pre-commit hooks
pre-commit run -a
> **Note**: Apply pre-commit hooks to do things like auto-formatting code and configs, performing code analysis or removing output from jupyter notebooks. See [# Best Practices](#best-practices) for more.
Update pre-commit hook versions in `.pre-commit-config.yaml` with:
pre-commit autoupdate
Run tests
# run all tests
pytest
# run tests from specific file
pytest tests/test_train.py
# run all tests except the ones marked as slow
pytest -k "not slow"
Use tags
Each experiment should be tagged in order to easily filter them across files or in logger UI:
uv run python -m {{cookiecutter.project_slug}}.scripts.train tags=["mnist","experiment_X"]
> **Note**: You might need to escape the bracket characters in your shell with `uv run python -m {{cookiecutter.project_slug}}.scripts.train tags=\["mnist","experiment_X"\]`.
If no tags are provided, you will be asked to input them from command line:
>>> uv run python -m {{cookiecutter.project_slug}}.scripts.train tags=[]
[2022-07-11 15:40:09,358][src.utils.utils][INFO] - Enforcing tags! <cfg.extras.enforce_tags=True>
[2022-07-11 15:40:09,359][src.utils.rich_utils][WARNING] - No tags provided in config. Prompting user to input tags...
Enter a list of comma separated tags (dev):
If no tags are provided for multirun, an error will be raised:
>>> uv run python -m {{cookiecutter.project_slug}}.scripts.train -m +x=1,2,3 tags=[]
ValueError: Specify tags before launching a multirun!
> **Note**: Appending lists from command line is currently not supported in hydra :(