Seedance 2.0 User Manual

Multimodal input, @reference system, camera replication, creative templates, video extension, and more.

Seedance 2.0 is Now Live on JiMeng! Kill the Game!

Ever since the days when we could only "tell stories" with text and first/last frames, we've dreamed of building a video model that truly understands what you want to express. Today, it's finally here!

JiMeng Seedance 2.0 now supports four input modalities — image, video, audio, and text — giving you richer ways to express yourself with more controllable generation.

You can use an image to set the visual style, a video to define character motion and camera movement, a few seconds of audio to establish rhythm and atmosphere... combine these with text prompts to make your creative process more natural, more efficient, and more like being a real "director."

In this upgrade, "reference capability" is the biggest highlight:

Reference images can precisely reproduce composition, character details
Reference videos support camera language, complex motion rhythms, and creative effects replication
Videos support smooth extension and continuation, generating consecutive shots based on user prompts — not just generating, but "keep shooting"
Editing capabilities are also enhanced, supporting character replacement, removal, and addition in existing videos

1. 参数预览

核心维度	Seedance 2.0
Image Input	≤ 9 images
Video Input	≤ 3 videos, total duration no more than 15s (reference videos cost a bit more)
Audio Input	Supports MP3 upload, ≤ 3 files, total duration no more than 15s
Text Input	Natural language
Generation Duration	≤ 15s, freely selectable from 4-15s
Sound Output	Built-in sound effects / background music

Interaction limit: The current maximum for mixed input is 12 files total. We recommend prioritizing uploads that have the greatest impact on visuals or rhythm, and allocating file counts wisely across different modalities.

2. 交互形式

⚠️注意：

JiMeng Seedance 2.0 supports "First/Last Frame" and "Universal Reference" entry points. Smart Multi-Frame and Subject Reference cannot be selected. If you only upload a first frame image + prompt, you can use the First/Last Frame entry; for multimodal (image, video, audio, text) combined input, use the Universal Reference entry.
The current interaction method uses "@material-name" to specify the purpose of each image, video, and audio. For example: @Image1 as first frame, @Video1 reference camera language, @Audio1 for background music.

Main Interface

Entry: Seedance 2.0 - Universal Reference / First & Last FrameOpen local file dialogSelect files and add to input box

How to Use @ in Universal Reference Mode

Method 1: Type "@" to invoke reference

Type "@"Select reference, drops into input boxEnter prompt

Method 2: Click the "@" parameter tool to invoke reference

Click "@" to select referenceEnter prompt

After uploading materials, images, videos, and audio all support hover preview.

Below are some use cases and creative approaches for different scenarios, to help you better understand how Seedance 2.0 has improved in generation quality, control capabilities, and creative expression. If you're not sure where to start, check out these examples for inspiration!

即梦Seedance 2.0 能力 / 提升预览

1. Significantly Enhanced Fundamentals: More Stable, Smoother, More Realistic!

Beyond multimodality, Seedance 2.0 has made significant improvements at the foundational level — physics are more realistic, motions are more natural and fluid, instruction comprehension is more precise, and style consistency is more stable. It can now reliably handle complex actions, continuous motion, and other challenging generation tasks, making overall video output more realistic and smoother. This is a comprehensive evolution of core capabilities!

Prompt

A girl elegantly hanging laundry, after hanging one piece she reaches into the basket for another, giving it a firm shake.

Prompt

The character in the painting has a guilty expression, eyes darting left and right as they peek out of the frame, quickly reaching out to grab a cola and take a sip, then showing a look of pure satisfaction. Then footsteps are heard, and the character hurriedly puts the cola back. A cowboy picks up the cola and walks away. Finally the camera pushes in as the screen fades to black with only a spotlight illuminating a canned cola, with stylish subtitles appearing at the bottom: "Yi Kou Cola — A Taste Not to Be Missed!"

Ultra Realism

Prompt

Camera slowly pulls back (revealing the full street view) and follows the heroine as she walks along a 19th-century London street, the wind ruffling her skirt. A steam-powered car comes speeding from the right side of the street, rushing past her — the gust lifts her skirt and she gasps in shock, quickly pressing it down with both hands. Background sounds include footsteps, crowd noise, and car sounds.

Prompt

Camera follows a man in black sprinting away, a crowd chasing behind him. Camera switches to a side tracking shot as he panics and crashes into a fruit stand, scrambles to his feet and keeps running. Sounds of a frantic crowd.

2. Comprehensive Multimodal Upgrade: Video Creation Enters the "Free Combination" Era!

2.1 Introduction to Seedance 2.0 Multimodal

Seedance 2.0 = Multimodal Reference (reference anything) + Strong Creative Generation + Precise Instruction Response (excellent comprehension)

Supports uploading text, images, videos, and audio — all of which can be used as subjects or references. You can reference anything's motion, effects, style, camera movement, characters, scenes, and sound. As long as your prompt is clear, the model can understand it.

Just describe the visuals and actions you want in natural language — be clear about whether it's a reference or an edit. When you have multiple materials, we recommend double-checking that each @reference is properly labeled so images, videos, and characters don't get mixed up.

2.2 Special Usage Patterns (No Limits, Just Suggestions)

Have a first/last frame image? Also want to reference video actions?

→ Write it clearly in the prompt, e.g.: "@Image1 as first frame, reference @Video1's fighting actions"

Want to extend an existing video?

→ Specify the extension duration, e.g. "Extend @Video1 by 5s". Note: the selected generation duration should be for the "new portion" only.

Want to merge multiple videos?

→ Describe the composition logic in the prompt, e.g.: "I want to add a scene between @Video1 and @Video2, with content about xxx"

No audio files? You can directly reference the sound from a video.

Want to generate continuous action?

→ Add continuity descriptions to the prompt, e.g.: "The character transitions from a jump directly into a roll, keeping the motion smooth and fluid" @Image1 @Image2 @Image3...

2.3 Those Persistently Difficult Video Problems? Now Actually Solvable!

Making videos always comes with headaches: faces changing mid-shot, motions not matching, unnatural video extensions, editing that throws off the entire rhythm... This multimodal upgrade tackles all these "long-standing pain points" at once. Below are specific use cases.

2.3.1 Comprehensive Consistency Improvement

You've probably experienced these frustrations: characters looking different between shots, product details getting lost, small text becoming blurry, scene jumps, inconsistent camera styles... These common consistency issues in creative work can now all be resolved in 2.0. From faces to clothing to font details, overall consistency is more stable and accurate.

Prompt

The man @Image1 walks tiredly down a hallway after work, his steps slowing, finally stopping at his front door. Close-up on his face — he takes a deep breath, adjusts his emotions, lets go of the negativity and relaxes. Then a close-up of him fishing out his keys, inserting them into the lock. After entering, his little daughter and a pet dog run over joyfully to greet him with hugs. The interior is very warm and cozy. Natural dialogue throughout.

Prompt

Replace the woman in @Video1 with a Chinese opera huadan performer on an elaborate stage. Reference @Video1's camera work and transitions, matching the camera to the character's movements for ultimate stage beauty and enhanced visual impact.

参考视频

Prompt

Reference all transitions and camera movements from @Video1, one continuous take, starting with a chess game.

参考视频

Prompt

0-2 seconds: Rapid four-panel flash cuts — red, pink, purple, and leopard-print bows each frozen in frame.

生成结果

Prompt

Create a commercial-style showcase of the handbag in @Image2. The side of the bag references @Image1, the surface texture references @Image3. All bag details should be displayed. Grand and majestic background music.

输入素材

Prompt

Use @Image1 as the first frame. First-person perspective, reference @Video1's camera movements. Upper scene references @Image2, left scene references @Image3, right scene references @Image4.

First Frame Camera Work

参考视频

2.3.2 Camera Movement Replication

Previously, getting the model to mimic cinematic blocking, camera work, or complex actions required either writing extremely detailed prompts or was simply impossible. Now, just upload a reference video and you're set.

Prompt

Reference the man's appearance from @Image1, he is in the elevator from @Image2, fully replicate all camera movements and the protagonist's facial expressions from @Video1.

参考视频

Prompt

Reference the man's appearance from @Image1, he is in the corridor from @Image2, fully replicate all camera movements from @Video1.

输入素材

参考视频

Prompt

The tablet from @Image1 as the main subject, camera movements reference @Video1.

Focus Rotation

参考视频

生成结果

Prompt

The female star from @Image1 as the main subject, reference @Video1's camera style for rhythmic push-pull-pan movements.

Push-Pull Dance

参考视频

Prompt

Reference @Image1 @Image2 for the spear-wielding character, @Image3 @Image4 for the dual-blade character. Mimic @Video1's actions, fighting in the maple leaf forest from @Image5.

参考视频

生成结果

Prompt

Reference Video1's character actions, reference Video2's orbiting camera language, generate a fighting scene between Character 1 and Character 2.

输入素材

参考视频

生成结果

Prompt

Reference Video1's camera movements and scene transition rhythm, replicate using the red supercar from Image1.

Car Camera Work

参考视频

2.3.3 Creative Templates / Precise Complex Effects Replication

Beyond generating images and writing stories, Seedance 2.0 also supports "follow-the-reference" — creative transitions, finished ads, film clips, complex edits. As long as you have reference images or videos, the model can identify action rhythms, camera language, visual structure, and precisely replicate them.

Prompt

Replace the person in @Video1 with @Image1. @Image1 as first frame, the person wearing virtual sci-fi glasses. Reference @Video1's camera work.

参考视频

生成结果

Prompt

Reference the model's facial features from the first image. The model wears the outfits from reference images 2-6 while approaching the camera.

输入素材

参考视频

Prompt

Reference the video's ad concept, use the provided down jacket images with ad copy to generate a new down jacket commercial.

参考视频

Prompt

Black and white ink wash style. The character from @Image1 references @Video1's effects and movements, performing a segment of ink wash tai chi kung fu.

参考视频

Prompt

Replace the first frame character in @Video1 with @Image1, fully reference @Video1's effects and movements.

Outfit Change

参考视频

生成结果

Prompt

Starting from the ceiling in @Image1, reference @Video1's jigsaw-shattering effect for the transition.

输入素材

参考视频

生成结果

Prompt

Open with a black screen, reference Video1's particle effects and material, golden gilded sand particles.

AE Intro

参考视频

生成结果

Prompt

The character from @Image1 references @Video1's actions and expression changes, showcasing an exaggerated instant noodle eating performance.

参考视频

2.3.4 Model Creativity and Story Completion

Prompt

Animate @Image1 as a comic strip, reading left to right, top to bottom.

参考视频

生成结果

Prompt

Reference the storyboard from @Image1, create a 15s healing-style opening sequence about "The Four Seasons of Childhood."

生成结果

Prompt

Reference Video1's audio, use Images 1-5 as inspiration to create an emotion-driven video.

输入素材

生成结果

2.3.5 Video Extension

Prompt

Extend 15s of video. Reference the donkey-riding-motorcycle character from @Image1 and @Image2, add a whimsical ad segment.

15s

输入素材

参考视频

Prompt

Extend the video by 6s. An intense electric guitar riff kicks in, with "JUST DO IT" ad text appearing in the center of the video.

输入素材

参考视频

Prompt

Extend @Video1 by 15 seconds. 1-5 seconds: Light and shadow slowly glide through venetian blinds across the wooden table and cup.

15s

参考视频

Prompt

Extend backward by 10s. In the warm afternoon light, the camera begins at the row of awnings on the street corner, gently fluttering in the breeze.

10sSunflower Scooter

参考视频

2.3.6 More Accurate Tone, More Realistic Sound

Prompt

Fixed camera, center fisheye lens peering downward through a circular opening.

参考视频

Prompt

Based on the provided office building promotional photos, generate a 15-second cinematic-realistic style real estate documentary.

输入素材

参考视频

Prompt

A roast-style dialogue in the "Cat & Dog Roast Room," with rich emotions matching a stand-up comedy performance.

输入素材

Prompt

The opening instrumental of the classic Yu Opera segment "The Case of Chen Shimei" begins to play.

输入素材

Prompt

Generate a 15-second music video. Keywords: steady composition / gentle push-pull / low-angle heroic feel / documentary but premium.

输入素材

Prompt

The girl with a hat in the center of frame gently sings "I'm so proud of my family!"

输入素材

Prompt

Fixed camera. The standing muscular man (captain) clenches his fist, waves his arm and says in Spanish: "Assault in three minutes!"

输入素材

Prompt

0-3 seconds: Opening with an alarm clock ringing, the blurry image fades in to reveal Image 1.

输入素材

参考视频

Prompt

The monkey from @Image1 walks toward the bubble tea shop counter, camera following behind him.

输入素材

Prompt

In a science-explainer style and tone, bring the content of Image 1 to life.

2.3.7 Stronger Shot Continuity (One-Take Shots)

Prompt

@Image1-5, a continuous one-take tracking shot, following a runner from the street up stairs, through a corridor, onto the rooftop, finally overlooking the city.

输入素材

Prompt

Starting with @Image1 as the first frame, the view zooms out to outside an airplane window.

输入素材

Prompt

Spy thriller style. @Image1 as the first frame, camera tracking the female spy in a red trench coat from the front.

输入素材

Prompt

From the exterior shot of @Image1, first-person POV with a fast push into the wooden cabin interior.

输入素材

Prompt

@Image1-5, a thrilling roller coaster ride from a first-person POV in one continuous take.

输入素材

2.3.8 Highly Usable Video Editing

Sometimes you already have a video and don't want to start over finding images or rebuilding from scratch — you just want to tweak a motion segment, extend a few seconds, or make a character's performance better match your vision. Now you can directly use existing video as input and make targeted modifications to specific segments, actions, or rhythms without changing anything else.

Prompt

Subvert the storyline in @Video1 — the man's gaze shifts from tender to ice-cold and ruthless.

参考视频

Prompt

Subvert the entire storyline of @Video1. 0-3 seconds: A man in a suit sits at a bar.

参考视频

Prompt

Replace the female lead singer in Video1 with the male lead singer from Image1, movements fully mimicking the original video.

输入素材