Production¤
Status: Supported runtime inference surface
Module: artifex.generative_models.inference.optimization.production
Source: src/artifex/generative_models/inference/optimization/production.py
Overview¤
This page documents the retained experimental production inference helpers.
The current runtime owns one real shared optimization step plus request-level monitoring:
jit_compilationthroughProductionOptimizer.optimize_for_production(...)- request/latency monitoring through
ProductionMonitorandProductionPipeline
Quantization, pruning, caching, and dynamic batching remain internal placeholders and are not reported as applied optimization techniques.
Retained Classes¤
OptimizationTargetOptimizationResultMonitoringMetricsProductionOptimizerProductionPipelineProductionMonitor
Current Semantics¤
optimize_for_production(...)currently reports onlyjit_compilationinOptimizationResult.optimization_techniques.ProductionPipeline.predict(...)andpredict_batch(...)record request count, latency, throughput, and error rate.memory_usage_gbandcache_hit_rateare unavailable inMonitoringMetricsand remainNoneuntil live instrumentation exists.
Example¤
from flax import nnx
import jax.numpy as jnp
from artifex.generative_models.inference.optimization.production import (
OptimizationTarget,
ProductionOptimizer,
)
optimizer = ProductionOptimizer()
target = OptimizationTarget(latency_ms=50.0)
sample_inputs = (jnp.ones((8, 64)),)
result = optimizer.optimize_for_production(model, target, sample_inputs)
assert result.optimization_techniques == ["jit_compilation"]
pipeline = optimizer.create_production_pipeline(model, result)
outputs = pipeline.predict(sample_inputs[0])
metrics = pipeline.get_monitoring_metrics()