Seedance 2.0 User Manual

Multimodal input, @reference system, camera replication, creative templates, video extension, and more.

Seedance 2.0 is Now Live on JiMeng! Kill the Game!

Ever since the days when we could only "tell stories" with text and first/last frames, we've dreamed of building a video model that truly understands what you want to express. Today, it's finally here!

JiMeng Seedance 2.0 now supports four input modalities — image, video, audio, and text — giving you richer ways to express yourself with more controllable generation.

You can use an image to set the visual style, a video to define character motion and camera movement, a few seconds of audio to establish rhythm and atmosphere... combine these with text prompts to make your creative process more natural, more efficient, and more like being a real "director."

In this upgrade, "reference capability" is the biggest highlight:

  • Reference images can precisely reproduce composition, character details
  • Reference videos support camera language, complex motion rhythms, and creative effects replication
  • Videos support smooth extension and continuation, generating consecutive shots based on user prompts — not just generating, but "keep shooting"
  • Editing capabilities are also enhanced, supporting character replacement, removal, and addition in existing videos

1. 参数预览

核心维度Seedance 2.0
Image Input≤ 9 images
Video Input≤ 3 videos, total duration no more than 15s (reference videos cost a bit more)
Audio InputSupports MP3 upload, ≤ 3 files, total duration no more than 15s
Text InputNatural language
Generation Duration≤ 15s, freely selectable from 4-15s
Sound OutputBuilt-in sound effects / background music

Interaction limit: The current maximum for mixed input is 12 files total. We recommend prioritizing uploads that have the greatest impact on visuals or rhythm, and allocating file counts wisely across different modalities.

2. 交互形式

⚠️注意:
  • JiMeng Seedance 2.0 supports "First/Last Frame" and "Universal Reference" entry points. Smart Multi-Frame and Subject Reference cannot be selected. If you only upload a first frame image + prompt, you can use the First/Last Frame entry; for multimodal (image, video, audio, text) combined input, use the Universal Reference entry.
  • The current interaction method uses "@material-name" to specify the purpose of each image, video, and audio. For example: @Image1 as first frame, @Video1 reference camera language, @Audio1 for background music.
Main Interface
Image 1
Image 2
Image 3
Entry: Seedance 2.0 - Universal Reference / First & Last FrameOpen local file dialogSelect files and add to input box
How to Use @ in Universal Reference Mode

Method 1: Type "@" to invoke reference

Image 1
Image 2
Image 3
Type "@"Select reference, drops into input boxEnter prompt
Method 2: Click the "@" parameter tool to invoke reference
Image 1
Image 2
Click "@" to select referenceEnter prompt

After uploading materials, images, videos, and audio all support hover preview.

Image 1
Image 2
Image 3

Below are some use cases and creative approaches for different scenarios, to help you better understand how Seedance 2.0 has improved in generation quality, control capabilities, and creative expression. If you're not sure where to start, check out these examples for inspiration!


即梦Seedance 2.0 能力 / 提升预览

1. Significantly Enhanced Fundamentals: More Stable, Smoother, More Realistic!

Beyond multimodality, Seedance 2.0 has made significant improvements at the foundational level — physics are more realistic, motions are more natural and fluid, instruction comprehension is more precise, and style consistency is more stable. It can now reliably handle complex actions, continuous motion, and other challenging generation tasks, making overall video output more realistic and smoother. This is a comprehensive evolution of core capabilities!

Prompt

A girl elegantly hanging laundry, after hanging one piece she reaches into the basket for another, giving it a firm shake.

Prompt

The character in the painting has a guilty expression, eyes darting left and right as they peek out of the frame, quickly reaching out to grab a cola and take a sip, then showing a look of pure satisfaction. Then footsteps are heard, and the character hurriedly puts the cola back. A cowboy picks up the cola and walks away. Finally the camera pushes in as the screen fades to black with only a spotlight illuminating a canned cola, with stylish subtitles appearing at the bottom: "Yi Kou Cola — A Taste Not to Be Missed!"

Ultra Realism
Prompt

Camera slowly pulls back (revealing the full street view) and follows the heroine as she walks along a 19th-century London street, the wind ruffling her skirt. A steam-powered car comes speeding from the right side of the street, rushing past her — the gust lifts her skirt and she gasps in shock, quickly pressing it down with both hands. Background sounds include footsteps, crowd noise, and car sounds.

Prompt

Camera follows a man in black sprinting away, a crowd chasing behind him. Camera switches to a side tracking shot as he panics and crashes into a fruit stand, scrambles to his feet and keeps running. Sounds of a frantic crowd.

2. Comprehensive Multimodal Upgrade: Video Creation Enters the "Free Combination" Era!

2.1 Introduction to Seedance 2.0 Multimodal

Seedance 2.0 = Multimodal Reference (reference anything) + Strong Creative Generation + Precise Instruction Response (excellent comprehension)

Supports uploading text, images, videos, and audio — all of which can be used as subjects or references. You can reference anything's motion, effects, style, camera movement, characters, scenes, and sound. As long as your prompt is clear, the model can understand it.

Just describe the visuals and actions you want in natural language — be clear about whether it's a reference or an edit. When you have multiple materials, we recommend double-checking that each @reference is properly labeled so images, videos, and characters don't get mixed up.

2.2 Special Usage Patterns (No Limits, Just Suggestions)

Have a first/last frame image? Also want to reference video actions?

→ Write it clearly in the prompt, e.g.: "@Image1 as first frame, reference @Video1's fighting actions"

Want to extend an existing video?

→ Specify the extension duration, e.g. "Extend @Video1 by 5s". Note: the selected generation duration should be for the "new portion" only.

Want to merge multiple videos?

→ Describe the composition logic in the prompt, e.g.: "I want to add a scene between @Video1 and @Video2, with content about xxx"

No audio files? You can directly reference the sound from a video.

Want to generate continuous action?

→ Add continuity descriptions to the prompt, e.g.: "The character transitions from a jump directly into a roll, keeping the motion smooth and fluid" @Image1 @Image2 @Image3...

2.3 Those Persistently Difficult Video Problems? Now Actually Solvable!

Making videos always comes with headaches: faces changing mid-shot, motions not matching, unnatural video extensions, editing that throws off the entire rhythm... This multimodal upgrade tackles all these "long-standing pain points" at once. Below are specific use cases.

2.3.1 Comprehensive Consistency Improvement

You've probably experienced these frustrations: characters looking different between shots, product details getting lost, small text becoming blurry, scene jumps, inconsistent camera styles... These common consistency issues in creative work can now all be resolved in 2.0. From faces to clothing to font details, overall consistency is more stable and accurate.

Prompt

The man @Image1 walks tiredly down a hallway after work, his steps slowing, finally stopping at his front door. Close-up on his face — he takes a deep breath, adjusts his emotions, lets go of the negativity and relaxes. Then a close-up of him fishing out his keys, inserting them into the lock. After entering, his little daughter and a pet dog run over joyfully to greet him with hugs. The interior is very warm and cozy. Natural dialogue throughout.

Prompt

Replace the woman in @Video1 with a Chinese opera huadan performer on an elaborate stage. Reference @Video1's camera work and transitions, matching the camera to the character's movements for ultimate stage beauty and enhanced visual impact.

Prompt

Reference all transitions and camera movements from @Video1, one continuous take, starting with a chess game.

Prompt

0-2 seconds: Rapid four-panel flash cuts — red, pink, purple, and leopard-print bows each frozen in frame.

Image 1
Prompt

Create a commercial-style showcase of the handbag in @Image2. The side of the bag references @Image1, the surface texture references @Image3. All bag details should be displayed. Grand and majestic background music.

Image 1
Prompt

Use @Image1 as the first frame. First-person perspective, reference @Video1's camera movements. Upper scene references @Image2, left scene references @Image3, right scene references @Image4.

First Frame Camera Work

2.3.2 Camera Movement Replication

Previously, getting the model to mimic cinematic blocking, camera work, or complex actions required either writing extremely detailed prompts or was simply impossible. Now, just upload a reference video and you're set.

Prompt

Reference the man's appearance from @Image1, he is in the elevator from @Image2, fully replicate all camera movements and the protagonist's facial expressions from @Video1.

Prompt

Reference the man's appearance from @Image1, he is in the corridor from @Image2, fully replicate all camera movements from @Video1.

Image 1
Image 2
Image 3
Image 4
Prompt

The tablet from @Image1 as the main subject, camera movements reference @Video1.

Focus Rotation
Image 1
Prompt

The female star from @Image1 as the main subject, reference @Video1's camera style for rhythmic push-pull-pan movements.

Push-Pull Dance
Prompt

Reference @Image1 @Image2 for the spear-wielding character, @Image3 @Image4 for the dual-blade character. Mimic @Video1's actions, fighting in the maple leaf forest from @Image5.

Image 1
Prompt

Reference Video1's character actions, reference Video2's orbiting camera language, generate a fighting scene between Character 1 and Character 2.

Image 1
Image 1
Prompt

Reference Video1's camera movements and scene transition rhythm, replicate using the red supercar from Image1.

Car Camera Work

2.3.3 Creative Templates / Precise Complex Effects Replication

Beyond generating images and writing stories, Seedance 2.0 also supports "follow-the-reference" — creative transitions, finished ads, film clips, complex edits. As long as you have reference images or videos, the model can identify action rhythms, camera language, visual structure, and precisely replicate them.

Prompt

Replace the person in @Video1 with @Image1. @Image1 as first frame, the person wearing virtual sci-fi glasses. Reference @Video1's camera work.

Image 1
Prompt

Reference the model's facial features from the first image. The model wears the outfits from reference images 2-6 while approaching the camera.

Image 1
Image 2
Image 3
Prompt

Reference the video's ad concept, use the provided down jacket images with ad copy to generate a new down jacket commercial.

Prompt

Black and white ink wash style. The character from @Image1 references @Video1's effects and movements, performing a segment of ink wash tai chi kung fu.

Prompt

Replace the first frame character in @Video1 with @Image1, fully reference @Video1's effects and movements.

Outfit Change
Image 1
Prompt

Starting from the ceiling in @Image1, reference @Video1's jigsaw-shattering effect for the transition.

Image 1
Image 1
Prompt

Open with a black screen, reference Video1's particle effects and material, golden gilded sand particles.

AE Intro
Image 1
Prompt

The character from @Image1 references @Video1's actions and expression changes, showcasing an exaggerated instant noodle eating performance.

2.3.4 Model Creativity and Story Completion

Prompt

Animate @Image1 as a comic strip, reading left to right, top to bottom.

Image 1
Prompt

Reference the storyboard from @Image1, create a 15s healing-style opening sequence about "The Four Seasons of Childhood."

Image 1
Prompt

Reference Video1's audio, use Images 1-5 as inspiration to create an emotion-driven video.

Image 1
Image 2
Image 3
Image 4
Image 1

2.3.5 Video Extension

Prompt

Extend 15s of video. Reference the donkey-riding-motorcycle character from @Image1 and @Image2, add a whimsical ad segment.

15s
Image 1
Prompt

Extend the video by 6s. An intense electric guitar riff kicks in, with "JUST DO IT" ad text appearing in the center of the video.

6s
Image 1
Prompt

Extend @Video1 by 15 seconds. 1-5 seconds: Light and shadow slowly glide through venetian blinds across the wooden table and cup.

15s
Prompt

Extend backward by 10s. In the warm afternoon light, the camera begins at the row of awnings on the street corner, gently fluttering in the breeze.

10sSunflower Scooter

2.3.6 More Accurate Tone, More Realistic Sound

Prompt

Fixed camera, center fisheye lens peering downward through a circular opening.

Prompt

Based on the provided office building promotional photos, generate a 15-second cinematic-realistic style real estate documentary.

Image 1
Image 2
Image 3
Prompt

A roast-style dialogue in the "Cat & Dog Roast Room," with rich emotions matching a stand-up comedy performance.

Image 1
Prompt

The opening instrumental of the classic Yu Opera segment "The Case of Chen Shimei" begins to play.

Image 1
Prompt

Generate a 15-second music video. Keywords: steady composition / gentle push-pull / low-angle heroic feel / documentary but premium.

Image 1
Prompt

The girl with a hat in the center of frame gently sings "I'm so proud of my family!"

Image 1
Prompt

Fixed camera. The standing muscular man (captain) clenches his fist, waves his arm and says in Spanish: "Assault in three minutes!"

Image 1
Prompt

0-3 seconds: Opening with an alarm clock ringing, the blurry image fades in to reveal Image 1.

Image 1
Image 2
Prompt

The monkey from @Image1 walks toward the bubble tea shop counter, camera following behind him.

Image 1
Image 2
Image 3
Prompt

In a science-explainer style and tone, bring the content of Image 1 to life.

2.3.7 Stronger Shot Continuity (One-Take Shots)

Prompt

@Image1-5, a continuous one-take tracking shot, following a runner from the street up stairs, through a corridor, onto the rooftop, finally overlooking the city.

Image 1
Image 2
Image 3
Image 4
Image 5
Prompt

Starting with @Image1 as the first frame, the view zooms out to outside an airplane window.

Image 1
Image 2
Image 3
Prompt

Spy thriller style. @Image1 as the first frame, camera tracking the female spy in a red trench coat from the front.

Image 1
Image 2
Image 3
Image 4
Prompt

From the exterior shot of @Image1, first-person POV with a fast push into the wooden cabin interior.

Image 1
Image 2
Image 3
Image 4
Prompt

@Image1-5, a thrilling roller coaster ride from a first-person POV in one continuous take.

Image 1
Image 2
Image 3
Image 4
Image 5

2.3.8 Highly Usable Video Editing

Sometimes you already have a video and don't want to start over finding images or rebuilding from scratch — you just want to tweak a motion segment, extend a few seconds, or make a character's performance better match your vision. Now you can directly use existing video as input and make targeted modifications to specific segments, actions, or rhythms without changing anything else.

Prompt

Subvert the storyline in @Video1 — the man's gaze shifts from tender to ice-cold and ruthless.

Prompt

Subvert the entire storyline of @Video1. 0-3 seconds: A man in a suit sits at a bar.

Prompt

Replace the female lead singer in Video1 with the male lead singer from Image1, movements fully mimicking the original video.

Image 1
Prompt

Change the woman's hairstyle in Video1 to long red hair. The great white shark from Image1 slowly emerges.

Image 1
Prompt

Video1 camera pans right, the fried chicken shop owner busily hands fried chicken to customers in line.

Image 1

2.3.9 Music Beat Syncing

Prompt

The girl in the poster keeps changing outfits, clothing styles reference @Image1 and @Image2.

Music Beat Sync
Image 1
Image 2
Image 3
Image 4
Prompt

Images from @Image1-7 sync to keyframes in @Video's visuals for beat-matching.

Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Prompt

Scenic landscape images from @Image1-6, synced to @Video's visual rhythm for beat-matching.

Prompt

8-second strategic-battle anime clip, matching a revenge theme.

2.3.10 Better Emotional Performance

Prompt

The woman from @Image1 walks to a mirror, looks at her reflection, pauses in thought, then suddenly breaks down screaming.

Image 1
Image 2
Prompt

This is a range hood commercial. @Image1 as the first frame, a woman elegantly cooking.

Image 1
Image 2
Image 3
Image 4
Prompt

@Image1 as the first frame, camera rotates and pushes in, the character suddenly looks up and begins roaring.

Image 1
Image 2
Image 3
Image 4