Seedance 2.0 Model Introduction

Model Overview

Seedance 2.0 is a cinematic multimodal audio and video co-generation model developed by ByteDance's Seed team. It employs an innovative dual-branch diffusion transformer (DB-DiT) architecture, supporting mixed input of four modalities: text, images, audio, and video. It can load up to 12 reference files, including 9 images, 3 video clips, and 3 audio clips, and outputs 2K resolution video and native stereo sound in a single forward propagation, completely resolving industry pain points such as audio-visual timing misalignment and lip-sync asynchrony. The model possesses powerful 3D spatial awareness and dynamic memory capabilities, exhibiting stable motion, physical realism, and strong subject consistency. It can automatically complete multi-shot narratives, storyboard design, and smooth camera movements, accurately reproducing complex scripts and director-level creative intentions. It leads the industry in instruction compliance, visual aesthetics, and audio reproduction, deeply adapting to professional scenarios such as film, advertising, and social media marketing. It can efficiently produce high-quality audiovisual content that meets industrial delivery standards, significantly reducing content creation costs and timelines.

Pricing

Resolution	Credits Consumed
4k(credits/s)	66
480p(credits/s)	6
720p(credits/s)	14
1080p(credits/s)	32

Technical Specifications

Parameter	Specification
Core Capability	image_to_video
Resolution	480p,720p,1080p,4k
Aspect Ratio	16:9,4:3,1:1,3:4,9:16,21:9
Duration	4,5,6,7,8,9,10,11,12,13,14,15
License	✔

Seedance 2.0

High-quality audio and video generation model, stable picture, and synchronized audio and video.

Input

Result

Seedance 2.0 Model Introduction

Model Overview

Pricing

Technical Specifications