The Realities of AI Video in China (AI Translation)
Listen to the full version

文|财新周刊 关聪
By Caixin Weekly's Guan Cong
似乎在一夜之间,人工智能(AI)推动视频生成技术又出现里程碑时刻。一条在社交媒体X上观看量超过240万的AI视频中,每个场景里的角色都兴奋地喊着:“我们能说话了!”
Seemingly overnight, artificial intelligence (AI) has driven video generation technology to another milestone. In an AI-generated video that has garnered more than 2.4 million views on social media platform X, every character in each scene exclaims excitedly: “We can talk now!”
让它们开口讲话的是谷歌在5月20日发布的全新AI视频模型Veo 3,其最大特点就是能在视频中融合音频,直接生成说话流畅、口型自然的人物,且自带符合场景特征的音效。在此之前,AI视频一直是“默片时代”,需要后期配音,再借助专门工具让角色嘴唇动作看起来合理。
What enabled these characters to speak was Google's newly released AI video model, Veo 3, unveiled on May 20. Its most distinctive feature is the ability to integrate audio directly into video content, generating characters who speak fluently with natural lip sync, accompanied by scene-appropriate sound effects. Previously, AI-generated videos remained in the "silent film era," requiring voiceovers to be added in post-production and relying on specialized tools to make the characters’ lip movements appear realistic.
中国公司方面,5月中旬,字节跳动的AI视频生成App“即梦”加大投放,在国内苹果应用商店排行榜上一度位居免费App下载量榜首,超越该公司力推的另两款App豆包和红果短剧。4月,快手(01024.HK)的同类应用“可灵”AI模型升级至2.0版本。“五一”假期前,快手专设可灵AI事业部,部门负责人直接向快手CEO程一笑汇报。
On the Chinese corporate front, in mid-May, ByteDance ramped up promotion of its AI-powered video generation app, Dreamina, which quickly soared to the top of the free-download chart in China’s Apple App Store—surpassing the firm's other heavily promoted apps, Doubao and Hongguo Short Drama. In April, Kuaishou Technology (01024.HK) upgraded the AI model underpinning its rival application, Keling, to version 2.0. Prior to the May Day holiday, Kuaishou established a dedicated Keling AI business unit, with its head reporting directly to CEO Cheng Yixiao.
- DIGEST HUB
- Chinese AI video platforms, led by Kuaishou’s Keling and ByteDance’s Jimeng, are rapidly advancing, surpassing some international peers in performance and commercialization, with Keling attracting 22 million users and over 100 million yuan in revenue by February 2025.
- AI video generation’s main workflow remains text-to-image-to-video, with platforms charging users via a points system; production costs have dropped, but labor and creative expertise are still critical.
- The AI content industry is commercializing, especially in advertising, and is evolving amid regulatory scrutiny and ongoing challenges in creating consistent, high-definition narrative video.
Artificial intelligence (AI) has rapidly advanced video generation technology, with Google unveiling its Veo 3 AI video model on May 20. Veo 3 allows AI-generated characters to speak naturally, complete with synchronized lip movements and relevant sound effects—a significant leap from earlier AI-generated "silent films," which required separately produced voiceovers and manual lip-syncing. This new capability has garnered millions of views online and marks a major step forward in the integration of AI into video content creation [para. 1][para. 2].
Chinese tech giants are also moving aggressively into AI video generation. ByteDance's Dreamina app quickly became the most downloaded free app in China’s Apple App Store, outpacing Doubao and Hongguo Short Drama. At the same time, Kuaishou upgraded its own AI video app, Keling, and launched a dedicated business unit for it, directly overseen by the CEO [para. 3]. Following OpenAI’s impactful release of Sora, Chinese companies capitalized on their own vast video databases and quickly surpassed initial overseas leaders such as Luma and Runway. By March and June 2024, ByteDance and Kuaishou launched their flagship AI video apps, I-Meng and Keleng, attracting professional creators with competitive pricing and performance. Kuaishou's Koling platform, for instance, has registered 22 million users and over 100 million yuan in revenue by February 2025, using a model that includes subscriptions and API sales [para. 4][para. 5].
Despite these advances, fully automated, seamless text-to-video is still out of reach, with current workflows requiring multiple steps and tools—first generating images from text and then converting these to videos [para. 6]. Productions like "Muye Guishi," an AI-generated short drama, illustrate the complexity and human oversight still involved in creating credible, emotionally nuanced AI-driven narratives [para. 7][para. 8].
The commercial ecosystem for AI-generated video in China is taking shape, driven by integration with creator communities and content distribution platforms. However, regulatory challenges are emerging, especially regarding the misuse of AI-generated visuals, prompting a crackdown on false information, pornography, and fraud by the Cyberspace Administration of China [para. 13][para. 14].
In terms of market performance, Keling excels at handling multi-shot sequences and micro-expressions, while JiMeng (Dreamina) offers faster, cheaper production. In March 2024, JiMeng had 8.93 million monthly active users, while Keling had 1.799 million [para. 15][para. 16]. Yet, both platforms still mostly produce 720p standard-definition video [para. 17]. Kuaishou has focused on foundational models for video over chatbots and has built immense computing clusters to support this AI initiative. By June 2024, Koling had seen its user base soar 25-fold in ten months, with over 108 million videos produced [para. 19][para. 20]. ByteDance, meanwhile, centralizes its AI development via the Seed division, supporting multiple business units [para. 26].
Production costs for AI-generated video have plummeted: whereas a traditional short film might cost hundreds of thousands of dollars, experimental AI projects have been completed for just a few thousand [para. 40]. However, significant labor and expertise are still required for quality productions, with subscription and points-based payment models becoming the standard [para. 42][para. 43].
Monetization extends beyond content creation itself, as AI-generated ads are boosting revenues for platforms like Tencent, Bilibili, and iQIYI. There are wide variations in pricing models between China and overseas, with Western markets showing a higher willingness to pay for creative tools [para. 54][para. 57].
In sum, China’s AI video ecosystem is rapidly evolving, marked by intense competition, rapid iteration, commercial integration, and growing regulatory oversight, while both opportunities and challenges persist around quality, workflow complexity, and monetization [para. 61][para. 62][para. 70].
- Google
谷歌 - According to the article, Google released a new AI video model, Veo 3, on May 20. Its main feature is the ability to generate videos with synchronized audio, allowing characters to speak naturally with matching lip movements and contextual sound effects. This represents a significant advancement over previous AI video models, which typically produced silent videos requiring post-production dubbing and lip-sync adjustments. Google also introduced Flow, a platform for creating cinematic AI videos.
- ByteDance
字节跳动 - ByteDance accelerated its AI video efforts in 2024, launching the "Dreamina" (即梦) app, which quickly topped China’s App Store downloads. It offers fast, affordable AI video generation targeting creators. ByteDance's video models, developed by its Seed team, focus on product maturity over first-mover advantage. It also collaborates with film festivals for AIGC (AI-generated content) projects, positioning itself as a major player in China's commercial AI video ecosystem.
- Kuaishou
快手 - Kuaishou has rapidly advanced its AI video generation tool "Keling," upgrading to version 2.0, which excels at multi-shot coherence and physical consistency. With over 22 million users, Keling has achieved over 100 million RMB in cumulative revenue. Kuaishou heavily invests in computing power to support model training and focuses on commercializing its video generation models, offering both consumer subscriptions and B2B API access.
- OpenAI
OpenAI - According to the article, OpenAI’s Sora video model set a benchmark in early 2024 with its highly realistic visuals, pushing “AI-generated video” from concept to reality. Sora’s debut prompted major Chinese companies like ByteDance and Kuaishou to quickly follow with their own AI video products, accelerating progress in the field and increasing competition among global AI video generation platforms.
- Luma
Luma - According to the article, Luma is mentioned as one of the overseas AI video companies that initially had a first-mover advantage in AI-generated video, alongside companies like Runway. However, after OpenAI's Sora launch in early 2024, Chinese companies such as ByteDance and Kuaishou quickly caught up and began to surpass overseas competitors like Luma with more affordable and effective AI video products. Luma is also used by creators for generating videos.
- Runway
Runway - According to the article, Runway is one of the overseas early leaders in AI video generation, along with companies like Luma. Chinese companies such as ByteDance and Kuaishou quickly caught up, launching consumer AI video products that challenged Runway’s market share. In practice, creators often use a mix of tools—including Runway—to generate video content, indicating that no single AI video product fully dominates or replaces others at this stage.
- Tencent
腾讯 - According to the article, Tencent released its Hunyuan Image 2.0 model on May 16, which can generate images in real time while typing. However, Tencent’s video generation capabilities remain at the model and middleware layer, without a dedicated product for ordinary users. Following the rise of DeepSeek, Tencent's chatbot Yuanbao integrated it early and benefited from this move.
- Alibaba
阿里巴巴 - According to the article, Alibaba’s video generation capabilities are still focused at the model and middleware layers, and it has not released video generation products targeted at ordinary users. This contrasts with companies like ByteDance and Kuaishou, which have developed user-facing AI video tools. Alibaba is currently not prioritizing the end-user AI video application market.
- DeepSeek
DeepSeek - DeepSeek is a Chinese-developed AI model known for its low training costs and strong inference capabilities, disrupting the dominance of large language model companies like ChatGPT. DeepSeek has been integrated into products such as Kimi and Tencent’s Yuanbao and, notably, into ByteDance’s Dreamina (即梦) to assist with instruction generation. Its efficiency and effectiveness have made it a standout in the AI-generated content and chatbot landscape in China.
- Pika
Pika - According to the article, Pika is mentioned as one of several video generation tools used in the production of the AI short drama "牧野诡事" (Makino Strange Story). It is listed alongside other AI video tools such as 即梦 (Dreamina), 可灵 (Koling), MiniMax, PixVerse, and Runway, indicating that Pika is part of the diverse toolkit adopted by Chinese AI video creators for generating and editing video content.
- MiniMax
MiniMax - According to the article, MiniMax (海螺MiniMax) is one of the AI video generation tools used in the production of the short drama "Muye Strange Stories" (《牧野诡事》). It is mentioned alongside other tools such as ByteDance's JiMeng, Kuaishou's Keling, PixVerse, Runway, and Pika, indicating its role as part of a diverse toolkit for AI-driven video content creation in China.
- PixVerse
PixVerse - According to the article, PixVerse is mentioned as one of the AI video generation tools used in the production of the AI short drama "The Lost Wilds" (《牧野诡事》). The team utilized multiple tools, including PixVerse, to generate different scenes, illustrating the current industry practice of combining various AI video products to achieve desired effects, as no single tool is yet comprehensive enough to fully replace the others.
- Midjourney
Midjourney - According to the article, Midjourney is an overseas text-to-image AI model that was developed relatively early. It is favored by creators in the image generation process due to its leading creativity and realistic effects, making it a top choice for generating images, especially when the workflow involves creating images first before generating videos.
- Stable Diffusion
Stable Diffusion - Stable Diffusion is an image generation model mentioned in the article as one of the popular subscription-based tools (like Midjourney) for creating AI-generated images. It allows users to generate images from text prompts and follows a paid subscription model. Since its launch, Stable Diffusion has become widely used among creators, especially for tasks involving text-to-image generation before making AI videos.
- PROMISE
PROMISE - PROMISE is an AI video studio founded by George Strompolos and Jamie Byrne, who previously built YouTube's creator monetization system. In May 2024, PROMISE secured a new round of funding involving Google’s AI Futures Fund and other VCs. Despite focusing on AI video, their latest film "Ninja Punk" still relied on traditional motion capture and 3D modeling, reflecting their belief that AI videos cannot be fully separated from human creativity.
- Bilibili
B站 - According to the article, AIGC-generated advertisements now account for 30% of Bilibili’s overall performance advertising expenditure. This demonstrates that Bilibili is actively leveraging AI-generated content to enhance its advertising business, contributing significantly to its monetization and operational strategies.
- iQIYI
爱奇艺 - According to the article, iQIYI reported that AI-generated advertising materials have significantly boosted ad effectiveness, with AI-produced ads increasing the advertiser's return on investment by over 20%. This highlights AI's growing role in enhancing the efficiency and impact of advertising in the internet industry, particularly for companies like iQIYI that operate video platforms.
- MOST POPULAR



