Training Systems¤

Status: Supported runtime training reference

artifex.generative_models.training keeps the shared owner set narrow: the shared package owns Trainer, typed optimizer and scheduler factories, callback modules, gradient accumulation helpers, distributed utilities, staged and streaming loop helpers, and typed RL trainer contracts. Family-specific trainer implementations live under artifex.generative_models.training.trainers.

Shared Trainer¤

Use Trainer when you want an explicit objective boundary and callback-aware training loop:

from artifex.generative_models.core.configuration import (
    OptimizerConfig,
    SchedulerConfig,
    TrainingConfig,
)
from artifex.generative_models.training import Trainer, create_optimizer, create_scheduler
from artifex.generative_models.training.callbacks import (
    CallbackList,
    ProgressBarCallback,
    ProgressBarConfig,
)

optimizer_config = OptimizerConfig(
    name="adamw",
    optimizer_type="adamw",
    learning_rate=1e-3,
    weight_decay=0.01,
)
scheduler_config = SchedulerConfig(
    name="cosine",
    scheduler_type="cosine",
    warmup_steps=1_000,
    cycle_length=100_000,
    min_lr_ratio=0.1,
)
training_config = TrainingConfig(
    name="baseline-training",
    optimizer=optimizer_config,
    scheduler=scheduler_config,
    batch_size=64,
    num_epochs=20,
)

schedule = create_scheduler(
    SchedulerConfig(
        name="cosine",
        scheduler_type="cosine",
        warmup_steps=1_000,
        cycle_length=100_000,
        min_lr_ratio=0.1,
    ),
    base_lr=optimizer_config.learning_rate,
)
optimizer = create_optimizer(
    OptimizerConfig(
        name="adamw",
        optimizer_type="adamw",
        learning_rate=1e-3,
        weight_decay=0.01,
    ),
    schedule=schedule,
)

trainer = Trainer(
    model=model,
    training_config=training_config,
    optimizer=optimizer,
    loss_fn=loss_fn,
    callbacks=CallbackList([
        ProgressBarCallback(ProgressBarConfig(show_metrics=True)),
    ]),
)

Family Trainers¤

The shared package does not hide model-specific objectives behind one universal trainer class. Use the trainer family that matches the model runtime you are actually training:

VAE Trainer
GAN Trainer
Diffusion Trainer
Flow Trainer using FlowTrainingConfig(time_sampling="logit_normal") when you want the retained shared flow-matching configuration surface
Energy Trainer
Autoregressive Trainer
REINFORCE Trainer
PPO Trainer
GRPO Trainer
DPO Trainer

Distributed Utilities¤

Artifex ships distributed helpers as utilities, not as trainer subclasses. The retained owners are:

DeviceMeshManager in mesh.md
DataParallel in data_parallel.md
DevicePlacement in device_placement.md
DistributedMetrics in distributed_metrics.md

Advanced Shared Utilities¤

GradientAccumulator and DynamicLossScaler live in gradient_accumulation.md
shared helper functions such as sample_logit_normal live in utils.md
callback surfaces live in base.md, checkpoint.md, early_stopping.md, logging.md, and profiling.md

Current Training Pages¤

Callbacks: base, checkpoint, early_stopping, logging, profiling
Factories and helpers: factory, gradient_accumulation, utils
Distributed utilities: data_parallel, device_placement, distributed_metrics, mesh
Family trainers: vae_trainer, gan_trainer, diffusion_trainer, flow_trainer, energy_trainer, autoregressive_trainer
RL trainers: reinforce, ppo, grpo, dpo

Coming Soon¤

Standalone optimizer and scheduler module pages remain roadmap-only until real modules exist. Use the current factory owners instead.

Planned-only or future pages: adamw, adafactor, lion, scheduler, optax_wrappers, exponential, linear, cosine, mixed_precision, tracking, visualization, model_parallel