What is CFG scale in AI image generation?

CFG scale (Classifier-Free Guidance) controls how strictly the AI model follows your text prompt. A scale of 1–3 gives the model significant creative freedom. A scale of 7–10 produces outputs that closely match the prompt. Values above 15 often produce over-saturated, artificial-looking images. FLUX models typically work best at CFG 3–7.

What is a LoRA in AI image generation?

A LoRA (Low-Rank Adaptation) is a lightweight fine-tuning method that trains a small set of additional weights on specific images, enabling a base model to consistently produce a specific style, character, or subject. FLUX LoRAs work differently from Stable Diffusion LoRAs and are not interchangeable between base models.

What is the difference between text-to-image and image-to-image?

Text-to-image (T2I) generates a new image entirely from a text description. Image-to-image (img2img) takes an existing image as a starting point and modifies it based on a text prompt, preserving elements of the original composition and structure. Image-to-video (I2V) extends a still image into a short video clip.

What is temporal coherence in AI video generation?

Temporal coherence is the consistency of subjects, objects, and environments across video frames. A video with poor temporal coherence will show subjects that change appearance, flicker, or morph between frames. Veo 3 and Runway Gen-4.5 have the strongest temporal coherence among 2026 commercial video models.

AI Image Generation Glossary: Every Term You Need to Know in 2026

Why terminology matters in AI generation

The AI image generation field moves faster than documentation. Terms that were research jargon in 2022 — CFG scale, LoRA, inpainting, latent diffusion — now appear in every generation interface, model selector, and prompt guide. Understanding what these terms mean is not academic: it directly affects whether your output quality is controlled or accidental, and whether your model selection is intentional or based on marketing.

This glossary is organized by category. Start with the section most relevant to your current work: core generation terms for beginners, architecture terms for developers, output quality terms for practitioners, video terms for video creators, and smart routing terms for anyone building at scale.

Core generation terms (A–F)

CFG Scale (Classifier-Free Guidance): Controls how strictly a diffusion model follows your text prompt. A low CFG (1–4) gives the model more creative freedom — useful when you want surprising or varied outputs. A high CFG (7–12) produces outputs that closely adhere to the prompt — useful for controlled production work. Values above 15 often produce over-saturated, artificial images. FLUX models typically work best at CFG 3–7; Stable Diffusion models at 7–10.
Denoising Steps: The number of iterations a diffusion model runs to progressively refine a noisy starting image into a final output. More steps generally means higher quality but slower generation. FLUX Schnell uses as few as 4 steps via distillation. FLUX Pro uses 20–50 steps. Most models have a "sweet spot" where increasing steps beyond a threshold does not meaningfully improve quality.
Diffusion Model: The dominant architecture for AI image generation in 2026. A diffusion model learns to remove noise from a progressively noisy image, training on millions of image-text pairs. At inference, it starts from pure noise and progressively refines it into a coherent image guided by the text prompt. FLUX, Imagen, SDXL, and Stable Diffusion 3 are all diffusion models.
FLUX: The image generation model family from Black Forest Labs. FLUX 1.1 Pro is the commercial API version (highest quality, $0.04/image via fal.ai). FLUX Dev is the research version with slightly lower quality but lower cost. FLUX Schnell is a distilled, ultra-fast version at minimal cost but with quality trade-offs. FLUX Kontext specializes in image editing and regional inpainting. FLUX dominates photorealistic commercial use cases in 2026.

Model architecture terms (G–O)

GAN (Generative Adversarial Network): A predecessor architecture to diffusion models. GANs train two networks simultaneously: a generator that creates images and a discriminator that evaluates their realism. GANs are still widely used for image upscaling, face enhancement, and video frame interpolation — even in systems built on top of diffusion generators.
Image-to-Image (img2img): A generation mode where an existing image is provided as a starting point alongside a text prompt. The model modifies the existing image based on the prompt, preserving varying degrees of the original structure depending on a "denoising strength" setting. High denoising strength = major changes. Low denoising strength = subtle edits. Used for product retouching, style transfer, and background replacement.
Inpainting: Regenerating a specific masked region of an existing image while leaving the rest unchanged. Used for removing unwanted objects, replacing backgrounds, adding elements to a scene, and retouching specific details. FLUX Kontext and Stable Diffusion XL Inpainting are the leading inpainting models in 2026.
Latent Space: The compressed mathematical space where diffusion models operate internally. Rather than working directly on pixel values (which would be computationally prohibitive), models like FLUX operate in a lower-dimensional latent space and decode the final result to pixel space. The VAE (Variational Autoencoder) handles this encoding and decoding step.
LoRA (Low-Rank Adaptation): A lightweight fine-tuning technique that trains a small additional set of model weights on specific images — typically 10–50 examples of a consistent subject, style, or character. LoRAs enable consistent replication of a brand mascot, person, product, or visual style without retraining the full multi-billion-parameter base model. FLUX LoRAs use a different format from Stable Diffusion LoRAs and are not interchangeable. LoRA weight (0.5–1.0) controls how strongly the style is applied.
Model Distillation: Training a faster, smaller model using a larger model as a teacher. The distilled model learns to approximate the larger model's outputs in fewer computation steps. FLUX Schnell is a distilled version of FLUX Pro — producing similar quality in 4 steps instead of 50, enabling generation speeds under 5 seconds.
Negative Prompt: Terms explicitly excluded from the generation. Telling the model what not to include: "no text," "no watermark," "no distorted hands," "no blur," "no extra limbs." Effective use of negative prompts significantly improves output quality for complex scenes. Not all models support negative prompts equally — FLUX handles them less reliably than SDXL.

Output quality terms (P–S)

Prompt Adherence: How closely the generated output matches the written text prompt. Scored as a quality metric: how many specified attributes (objects, materials, lighting, composition, style, aspect ratio) appear correctly in the output. FLUX 1.1 Pro has the highest prompt adherence scores of 2026 commercial models for realistic imagery.
Resolution Tiers: The output resolution levels available from a model. Standard tier: 512–1024px (sufficient for web and social media). HD tier: 1536–2048px (suitable for print and large digital displays). Ultra tier: 3072–4096px (required for billboard and commercial print). Higher resolution tiers cost more per generation at most API providers.
Sampling Method / Scheduler: The algorithm controlling how noise is progressively removed during diffusion. Common schedulers: Euler A (fast, creative), DPM++ 2M Karras (high quality, recommended for most use cases), DDIM (deterministic, good for reproducibility), and LCM (lightning-fast, lower quality). FLUX uses its own scheduler. Stable Diffusion models let you choose your scheduler, which meaningfully affects output quality.
Seed: A random number that initializes the noise starting point for a generation. Using the same seed + same prompt + same settings produces the same output — enabling reproducibility. Changing the seed produces a different variation from the same prompt. When you find a direction you like, record the seed to regenerate it reliably.
Style Transfer: Applying the visual style of a reference image to new content while preserving the subject or composition of the original. Achieved through img2img with a reference image, or through a style LoRA. For brand work, style transfer allows new product images to match the aesthetic of an existing campaign without manual recreation.

Video generation terms (T–Z)

Temporal Coherence: The consistency of subjects, environments, and objects across video frames. A video with poor temporal coherence shows flickering, morphing, or identity-shifting subjects between frames. Veo 3 and Runway Gen-4.5 have the strongest temporal coherence among 2026 commercial models. Temporal coherence is the primary quality metric that separates professional-grade AI video from amateur-tier output.
Text-to-Video (T2V): Generating a video clip from a text description, without any image input. The model interprets the prompt and generates motion, scene, camera movement, and (in Veo 3's case) audio from text alone. The quality of T2V outputs in 2026 has reached the threshold where they are usable as B-roll and establishing shots in commercial video production.
Image-to-Video (I2V): Animating a still image — giving it motion while preserving the visual identity of the input image. A product photo can be animated with subtle movement and ambient light changes. A portrait can be animated with breathing, blinking, and subtle head movement. Kling 3.0 and Runway Gen-4.5 have the strongest I2V capabilities in 2026.
Veo 3: Google DeepMind's text-to-video and image-to-video model released in 2026. The first commercially available model with native audio generation — synchronized ambient sound, dialogue, and music generated alongside the video. Leading quality in temporal coherence and 4K output. Available via Vertex AI and fal.ai.
Kling 3.0: Kuaishou's video generation model, third major version released in 2026. The best cost-per-second ratio among commercial video models with strong temporal coherence for product and lifestyle scenes. Available via fal.ai with a stable, documented API. Best for high-volume B-roll production where quality needs to be "good" rather than "cinematic."

Smart routing and API terms

Smart Routing: An automatic model selection layer that analyzes each generation request and routes it to the optimal AI model based on task type, quality requirements, budget ceiling, and latency constraints. Instead of using one model for all generations, smart routing applies the best model per use case — reducing costs by 30–55% for mixed-use generation pipelines. Eaxy's core feature.
Inference Endpoint: The API URL where generation requests are sent. Each model provider has a different endpoint, authentication method, and request format. A unified API (like eaxy's) provides a single endpoint that routes to multiple underlying model inference endpoints, abstracting provider-level differences.
Latency (p50 / p95): p50 is the median generation time: 50% of requests complete faster than this. p95 is the 95th-percentile time: 95% of requests complete faster than this. p95 matters for production systems because it represents the "slow tail" that affects user experience. A model with p50 of 10s and p95 of 45s will occasionally produce unacceptably slow responses even if average performance looks fine.
Webhook: A callback URL that receives a notification when an async generation completes. Instead of polling the API repeatedly to check if a generation is done, you provide a webhook URL and the provider calls it with the result when ready. Essential for production pipelines that generate large batches — polling is expensive and unreliable at scale.
Unified API: A single API endpoint that routes requests to multiple underlying AI models, abstracting provider differences behind a consistent interface. Eaxy's API is a unified API: one endpoint, one authentication method, and smart routing handles which of 10+ models actually generates your output. Reduces integration complexity from maintaining 5+ API integrations to maintaining one.

How these terms connect to model selection on eaxy

When you understand the terms, model selection stops being guesswork. High CFG scale with FLUX Pro = strict photorealism for product photography. A style LoRA on FLUX Dev = consistent brand aesthetic at lower cost. Webhook integration = scalable async batch generation pipeline. Smart routing = optimal model per task without manual selection overhead.

Eaxy's interface exposes these parameters progressively: basic mode uses smart defaults, advanced mode gives you control over CFG, steps, seed, and model selection for every generation.

Start generating with the right model for your use case — 10 free credits, no credit card required. The comparison tool and glossary are free for everyone.