# Video Generation

### How Video Generation Works

{% hint style="info" %}
Video generation requires a **PREMIUM**, or **ULTIMATE** subscription.

Pricing depends on your **content mode**, **resolution**, **duration**, and **audio settings**.
{% endhint %}

Video generation takes a generated image and animates it based on your prompt, creating a short video clip with motion, effects, and life.

#### The Process

1. **Start with an Image**: Select the video icon on a generated image — either from the **Generate page** or from your **Collection page**
2. **Choose Content Mode**: Choose between Advanced and Adult mode
3. **Enter Video Prompt**: Describe the motion and effects you want
4. **Generate Video**: AI creates a video based on your image and prompt
5. **View Result**: Watch your animated video

{% hint style="info" %}
You can generate videos from both the Generate page and directly from your **Collection page** — just tap the video icon on any generated image.
{% endhint %}

<figure><img src="/files/k0IMdcuElkJ2aBW5Ct73" alt="" width="563"><figcaption></figcaption></figure>

#### Content Mode & Settings

Video generation supports two content modes:

* **Advanced**: The primary video generation model with flexible duration and audio on by default.
* **Adult**: Uses legacy settings (720p, 8 seconds).

#### Advanced Options

When generating **Advanced** videos, you can choose:

* **Resolution**: 480p, 720p, 1080p
* **Duration**: 4–15 seconds
* **Audio**: On by default

### Video Prompting Guide

{% hint style="info" %}
Advanced mode prompts can be up to **5,000 characters** long. The model has excellent prompt adherence — the more detail you provide about the motion you want, the better your results will be.
{% endhint %}

The Advanced video model is a massive leap in quality. It handles complex motion, realistic physics, multi-subject interactions, and synchronized audio generation all in a single pass. To get the most out of it, follow these prompting principles.

#### Core Prompt Structure

Since the image already provides the visual context, build your prompt around motion and change:

1. **Action & motion** — What moves and how (the most important element)
2. **Camera movement** — How the "camera" moves (one movement per shot)
3. **Pacing** — How fast or slow things happen
4. **Audio cues** — Dialogue, sounds, or atmosphere you want to hear
5. **Constraints** — What to avoid (no distortion, no jitter, etc.)

{% hint style="info" %}
**Aim for 50–200 words.** This is the sweet spot for quality. You don't need to re-describe what's already in the image — focus on the animation.
{% endhint %}

#### Key Principles

**1. Focus on motion, not the scene**

The image is your scene. Don't re-describe the setting, outfit, or appearance — the model already sees all of that. Instead, describe what *changes*.

* ✅ "She slowly turns her head and smiles, hair swaying gently in a breeze"
* ✅ "Leans forward slightly, eyes narrowing with a playful expression"
* ❌ "A woman standing on a beach in a red dress at sunset" — the image already shows this

**2. Be specific about intensity and pacing**

Use degree words to control how movements feel. The model is very responsive to pacing cues.

* ✅ "Slowly brushes hair behind her ear"
* ✅ "Gentle breeze lifts her hair, subtle movement"
* ✅ "Quickly glances over her shoulder"
* ❌ "Moves" — too vague, unpredictable result

**3. One camera movement per shot**

The model handles camera movement well, but only when you give it one clear instruction. Combining multiple movements causes jitter.

* ✅ "Slow dolly push-in"
* ✅ "Smooth pan to the right"
* ✅ "Camera holds still" — perfectly valid, keeps the focus on subject motion
* ❌ "Dolly in while panning left and tilting up" — too many movements at once

Use pacing words like "slow," "smooth," or "gentle" rather than technical parameters.

**4. Separate camera movement from subject movement**

This is the most common prompting mistake. Be explicit about what moves and what stays still.

* ✅ "She slowly turns her head to the right. Camera holds fixed framing."
* ✅ "Camera slowly pushes in. She holds her pose, only her hair moves in the wind."
* ❌ "Everything moving at once" — confuses the model

**5. Use sequential actions for complex scenes**

List actions in the order you want them to happen. The model follows temporal sequences well:

* "Looks down at her phone with a surprised expression, then looks up at the camera with excitement, gasps softly"
* "Takes a sip of coffee, sets the cup down, then glances out the window with a thoughtful expression"

**6. Add quality constraints at the end**

Append a short constraint line to any prompt for consistently better output:

* "No distortion. No jitter. Face stable, no deformation."

This works across all content types and noticeably reduces common artifacts.

#### Example Prompts

**Simple & effective:**

> She brushes hair behind her ear with a soft smile, gentle breeze, smooth and natural motion. No distortion.

**With camera movement:**

> Slow dolly push-in. She looks up toward the camera with a warm, inviting expression, lips parting slightly. Hair shifts gently. No jitter, face stable.

**Complex sequential action:**

> She glances down at her phone and her eyes widen with surprise, then she looks up at camera with excitement and laughs softly. Camera holds still. No distortion, no deformation.

<figure><img src="/files/rApE9DqFYUUeTdyZNJoA" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="/files/eX4irqEQa22eZd6SYBWb" alt="" width="375"><figcaption></figcaption></figure>

#### Prompting With Audio

Audio is generated alongside the video in Advanced mode — not as a separate step. The model reads the source image and your prompt to produce matching sound automatically.

**Three audio layers are generated simultaneously:**

* **Lip-synced speech** — Include short dialogue in your prompt with emotional context for best results (e.g., "She softly whispers 'Hey, I missed you'"). Keep dialogue to **5–10 words per line** for the cleanest lip sync.

<figure><img src="/files/PLtnatJc0uJOh9jiqgkh" alt="" width="375"><figcaption></figcaption></figure>

* **Sound effects** — Tied to the actions you describe. Footsteps, cloth movement, glass clinking, a sigh — the model generates these based on what's happening in the scene.

<figure><img src="/files/ps2PsNwoC8n8QW82hcWy" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="/files/nYO22uuuk4ZshajHgSav" alt="" width="375"><figcaption></figcaption></figure>

* **Ambient sounds & music** — Background atmosphere inferred from the source image and prompt (city noise, ocean waves, quiet room tone, soft background music).

{% hint style="info" %}
**Audio tip:** Label emotions before dialogue for much better results. "She whispers softly" produces noticeably better lip sync and tone than just writing the dialogue alone.
{% endhint %}

#### Things to Avoid

| Don't do this                         | Why                                                                           | Do this instead                          |
| ------------------------------------- | ----------------------------------------------------------------------------- | ---------------------------------------- |
| Re-describing the image               | The model already sees the image — redundant prompting wastes space           | Focus on motion and change               |
| Multiple camera movements in one shot | Causes jitter and incoherent motion                                           | One clear camera movement per shot       |
| Contradicting the source image        | Creates visual confusion (e.g., describing a beach when the image is indoors) | Stay consistent with what's in the image |
| Overloading a short clip              | 4–15 seconds can't contain an entire story                                    | One clear moment per generation          |
| Vague motion ("she moves")            | Unpredictable results                                                         | Be specific about what moves and how     |
| Long dialogue lines                   | Lip sync degrades with long speech                                            | Keep dialogue to 5–10 words              |

{% hint style="info" %}
**Pro tip:** For clips longer than \~8 seconds, consider whether two shorter generations might give you better results. Quality stays highest in the 4–8 second range.
{% endhint %}

<figure><img src="/files/1v4kVwLvstAdCIE3xVFa" alt="" width="375"><figcaption></figcaption></figure>

### Requirements & Limitations

#### Subscription Requirements

* **FREE Tier**: ❌ Not available
* **PREMIUM Tier**: ✅ Available
* **ULTIMATE Tier**: ✅ Available

### Understanding Video Results

#### What to Expect

The Advanced model handles complex motion, realistic physics, and multi-subject interactions far better than previous models. That said, AI video generation still has some limitations:

* **Hands & fingers** — Occasional extra or fused digits can appear in complex hand movements
* **Fine patterns** — Tight weaves, tiny checks, and detailed textures can shimmer or flicker. Simpler patterns work better.
* **Variation** — Results vary between generations, even with the same prompt
* **Iteration** — You may need to regenerate or tweak your prompt for the best result

### Frequently Asked Questions

**Q: How much do videos cost?**

* **Advanced**: Cost varies by resolution and duration. Audio is on by default.
* **Adult**: 600 moments (720p, 8 seconds).

**Q: Do I need a subscription?**

* Yes, video generation requires a PREMIUM or ULTIMATE subscription.

**Q: Can I use any image?**

* Yes, any generated image can be used as source for video.

**Q: What resolutions can I generate?**

* **Advanced**: 480p, 720p, or 1080p.
* **Adult**: 720p.

**Q: Can I add audio?**

* In **Advanced** mode, audio is on by default.
* Audio supports speech lip sync, ambient sounds, action sounds, and background music.

**Q: Can I download videos?**

* Yes, you can download generated videos from the videos tab.

**Q: Can I generate multiple videos from same image?**

* Yes! Generate as many videos as you want from the same image with different prompts.

**Q: What motion can I create?**

* The Advanced model excels at complex, realistic motion — walking, turning, hair blowing in wind, clothing movement, facial micro-expressions, multi-subject interactions, and much more. It handles physics naturally, so don't be afraid to get creative.

**Q: How do I get the best results?**

* Start your prompt with the shot type, always describe the lighting, use one camera movement per shot, and add "No distortion. No jitter." at the end. See the [Video Prompting Guide](#video-prompting-guide) for detailed tips.

***

Want to learn more? Check out [Image Generation](/create/image-generation.md) or explore [Custom Characters](/create/custom-characters.md)!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wiki.secrets.ai/create/video-generation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
