Skip to content

Evaluation

Evaluate trained models on test datasets using PyTorch Lightning's evaluation capabilities.

Basic Evaluation

Evaluate Latest Checkpoint

Evaluate the most recent checkpoint:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval

This automatically finds and loads the best checkpoint from your latest training run.

Evaluate Specific Checkpoint

Evaluate a specific checkpoint file:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval ckpt_path="/path/to/checkpoint.ckpt"

Evaluate with Custom Data

Override the default test dataset:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.test_path="/path/to/test/data"

Hardware Configuration

CPU Evaluation

Force CPU evaluation:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval trainer.accelerator=cpu

GPU Evaluation

Use GPU for faster evaluation:

# Single GPU (automatic if available)
uv run python -m {{cookiecutter.project_slug}}.scripts.eval

# Specify GPU device
uv run python -m {{cookiecutter.project_slug}}.scripts.eval trainer.devices=1 trainer.accelerator=gpu

Multi-GPU Evaluation

Distribute evaluation across multiple GPUs:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval trainer.devices=4 trainer.strategy=ddp

Evaluation Outputs

Results Logging

Evaluation results are automatically saved to:

  • Console: Immediate results display
  • CSV Files: Detailed metrics in logs/eval/ (when CSV logger is configured)
  • TensorBoard: Visual metrics in tensorboard logs (when TensorBoard logger is configured)
  • Weights & Biases: Remote experiment tracking (when wandb logger is configured)
  • Comet: ML experiment tracking and monitoring (when Comet logger is configured)
  • Neptune: ML experiment management (when Neptune logger is configured)
  • MLflow: Experiment tracking dashboard (when MLflow logger is configured)

Batch Size Optimization

Optimize batch size for evaluation speed:

# Larger batch for faster evaluation
uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.batch_size=128

# Smaller batch if memory constrained
uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.batch_size=32

Debugging Evaluation

Verbose Output

Enable detailed logging during evaluation:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval trainer.logger.level=DEBUG

Limit Evaluation Batches

Quick evaluation for debugging:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval debug=limit

Profile Evaluation

Profile evaluation performance:

uv run python -m {{cookiecutter.project_slug}}.scripts.eval debug=profiler

Troubleshooting

Memory Issues

If evaluation runs out of memory:

# Reduce batch size
uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.batch_size=16

# Use CPU evaluation
uv run python -m {{cookiecutter.project_slug}}.scripts.eval trainer.accelerator=cpu

# Use lower precision
uv run python -m {{cookiecutter.project_slug}}.scripts.eval ++trainer.precision=16-mixed

Checkpoint Loading Issues

If checkpoint fails to load:

# Check checkpoint path
uv run python -m {{cookiecutter.project_slug}}.scripts.eval ckpt_path="/absolute/path/to/checkpoint.ckpt"

# Check checkpoint dict
uv run python -c "import torch; ckpt = torch.load('path/to/checkpoint.ckpt', map_location='cpu'); print('Checkpoint keys:', list(ckpt.keys()))"

Slow Evaluation

Speed up evaluation:

# Increase data loading workers
uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.num_workers=8

# Enable pin memory
uv run python -m {{cookiecutter.project_slug}}.scripts.eval data.pin_memory=true

# Use compiled model
uv run python -m {{cookiecutter.project_slug}}.scripts.eval model.torch_compile=true