Exclusive Interview with Kuaishou’s Koling: Video Models Must Overcome Computing Power and Data Limitations; Google Is a Formidable Rival (AI Translation)
Listen to the full version


文|财新 关聪
By Guan Cong, Caixin
【财新网】“谷歌从业务技术到各方面资源都是可灵非常强劲的对手,谷歌不仅在AI基建、人才等各方面能力积淀深,旗下还有视频平台YouTube,包括谷歌联合创始人谢尔盖·布林(Sergey Brin)也亲自盯着谷歌视频模型的研发。我们和谷歌有差距,但并不大。”在上海举行的2025年世界人工智能(AI)大会(WAIC)期间,快手(01024.HK)视觉生成与互动中心负责人万鹏飞接受财新专访时直言,视频模型极具市场潜力,但仍需突破算力和数据局限。
[Caixin Global] “From business technology to overall resources, Google is an extremely formidable competitor for us. Google’s capabilities in AI infrastructure and talent are deeply entrenched, and it also operates the video platform YouTube. Even Google co-founder Sergey Brin is personally overseeing the development of Google’s video models. There is a gap between us and Google, but it’s not that big,” said Wan Pengfei, Head of Visual Generation and Interaction Center at Kuaishou (01024.HK), in an exclusive interview with Caixin during the 2025 World Artificial Intelligence Conference (WAIC) held in Shanghai. Wan noted that video models have enormous market potential, but breaking through current constraints in computing power and data remains an ongoing challenge.
在大模型激烈战局中,快手并未把重心放在聊天机器人上。2024年初OpenAI的视频模型Sora以逼真效果震惊科技界,在它迟迟未能面向用户开放时,快手在去年6月推出基于与Sora类似的DiT架构的视频大模型可灵,迅速完成产品上线、商业化、出海以及创作者生态搭建。视觉生成与互动中心是快手可灵视频生成模型背后的核心技术团队。
In the heated race among large AI models, Kuaishou has not placed its main focus on chatbots. In early 2024, OpenAI’s video model Sora stunned the tech world with its lifelike results; while Sora has yet to be released to users, Kuaishou launched its own large video model, Kolin, based on a DiT architecture similar to Sora’s, back in June 2023. Kolin quickly achieved product launch, commercial application, international expansion, and the establishment of a creator ecosystem. The Visual Generation and Interaction Center serves as the core technical team behind Kuaishou’s Kolin video generation model.
万鹏飞告诉财新,快手很早关注视频生成技术,也意识到市场的潜力,对视频里各种内容、产业的形态、创作者的需求都比较了解,所以想尝试用AI辅助用户创作和表达,可灵是跟着新一轮技术突破出现的。其次,视频媒介是很高效的内容承载形式,也被证明能解决很多问题,市场规模本身就大,对于创造性的场景,能够在其中提升一些效率,或占一部分市场都会很可观。“可灵是一个通用模型,视频创作又是全球普遍的需求,所以也会放大原本的市场规模。”
Wan Pengfei told Caixin that Kuaishou has been paying attention to video generation technology for a long time, and is aware of the market’s potential. The company has a strong grasp of different types of video content, industry structures, and the needs of creators. As a result, it has been exploring ways to help users create and express themselves with AI, and Kelingshi emerged following the latest wave of technological breakthroughs. He added that video is a highly efficient format for delivering content and has already proven effective in addressing various issues. The market for video is inherently large, and even a slight increase in efficiency or market share in creative scenarios can be significant. “Kelingshi is a general-purpose model, and since video creation is a universal demand worldwide, this will further expand the original market size,” he said.

- DIGEST HUB
- Kuaishou’s video generation model Keling has iterated over 30 times, with over 45 million global creators generating 200 million videos and 400 million images; 70% of customers are overseas.
- Key competitors include Google’s Veo 3 and ByteDance’s Seedance 1.0 Pro; challenges remain in computing power and video data preparation.
- Kuaishou plans sustained investment in Keling, which generated monthly paid revenue exceeding 100 million yuan in April and May 2025.
The 2025 World Artificial Intelligence Conference (WAIC) in Shanghai served as a backdrop for an in-depth interview with Wan Pengfei, head of the Visual Generation and Interaction Center at Kuaishou, discussing the company's progress and strategy in AI-powered video generation. Wan noted that globally, Google remains Kuaishou’s strongest competitor due to its deep resources in AI infrastructure, talent, and the presence of YouTube. Yet, he believes the gap between Kuaishou and Google is not large, thanks to rapid advances in their own technology [para. 1].
Kuaishou’s main AI focus is not on chatbots, but rather on video generation models. While OpenAI’s Sora stunned the tech community in early 2024, Kuaishou’s KoLing—based on a Diffusion Transformer (DiT) architecture similar to Sora—was launched in June 2023. KoLing quickly moved through commercialization and built up a creator ecosystem, demonstrating Kuaishou’s deep understanding of the video market and AI’s potential to power new creative tools [para. 2][para. 3].
A key strength of KoLing is its applicability to universal video creation demands. Video is an efficient medium with vast market opportunities, and KoLing aims to enhance productivity and expand the market for both consumers and creators. KoLing’s flexibility means it’s used for both consumption and self-created content, with wide potential as long as the underlying technology continues to innovate [para. 4][para. 5].
KoLing has undergone over 30 iterations, most recently with the 2.1 series. Wan explained that KoLing 1.0 validated video generation’s practicality, while 2.0 helped standardize the industry. Rapid version cycles—every three months on average—are now a necessity in the competitive large model era [para. 6].
The model’s ability to generate local personalities and elements has kept it popular among creators, many of whom cite this as a primary reason for choosing KoLing. As of July 27, more than 45 million creators globally had used KoLing to generate over 200 million videos and 400 million images. Product development is often user-driven, with functionalities co-explored by users and engineers [para. 7][para. 8].
Scalability and data requirements remain primary challenges. Video model training needs vast and carefully structured datasets, which present higher barriers than text data. Objectively, the industry—KoLing included—has room for progress in this regard [para. 9].
Kuaishou faces competition from Bytedance, whose Seedance 1.0 Pro offers industry-low prices, and from global startups like PixVerse (over 60 million users) and MiniMax (300 million videos generated in six months), all vying for market share [para. 10].
Google’s Veo 3 is noted for its ability to generate synchronized audio with video, outpacing KoLing, which adds audio post-hoc. Veo 3’s capability to produce videos with lifelike speech and ambient sound is considered a milestone for the field [para. 11][para. 12].
AI video’s commercial value is already recognized, with adoption in film, advertising, and personal content creation, although production still requires multiple tools and manual optimization. The fundamental workflow—prompt-to-image, then image-to-video—remains unchanged [para. 13][para. 14][para. 15].
KoLing is focused on versatility, aiming to faithfully deliver user instructions without imposing its own aesthetic. Model improvements often require higher computation, influencing product pricing and optimization cycles [para. 16][para. 17].
Around 70% of KoLing’s customers are overseas, illustrating its global reach and the universality of video creation needs. Foreign users, in particular, show stronger payment willingness due to income opportunities [para. 18].
Kuaishou has prioritized its video AI initiatives, establishing an AI division in April 2024. Despite the supporting team accounting for less than 1% of Kuaishou’s 24,700 employees, KoLing achieved monthly paid revenue of over RMB 100 million in April and May. The company plans sustained, intensive investment, scaling up compute capacity to support ever-larger foundation models [para. 19].
- Kuaishou
- Kuaishou (01024.HK) is a Chinese tech company that has ventured into AI video generation with its KoLing model. KoLing, released in June of the previous year, has seen rapid development, with over 30 iterations and its latest version being 2.1. The model is known for generating realistic human figures and local elements, making it popular among creators. Kuaishou plans significant long-term investment in KoLing, aiming to expand its computing power for training and inference.
- ByteDance
- The article mentions ByteDance possesses extensive training data and similar to Kuaishou, has models, products, and content distribution platforms. ByteDance's video generation model, Seedance 1.0 Pro, launched on June 11, claims to be the industry's lowest priced. Their consumer-facing app, Jimeng, akin to Keling, is actively acquiring users.
- Google is seen as a strong competitor in AI, especially in infrastructure and talent. Its video model, Veo 3, can generate video with integrated audio, unlike Keling, which requires an additional step for audio. Google's co-founder, Sergey Brin, is reportedly involved in the development of their video models.
- Morgan Stanley
- Morgan Stanley noted that 70% of Keling's current customers are from overseas. Keling is a general model product with global appeal, and overseas users show a greater willingness to pay, especially as their creations with Keling generate revenue.
- PixVerse
- PixVerse, developed by Aisite Technology (爱诗科技), is a rapidly growing AI video generation tool that has gained popularity overseas. It is known for its user-friendly interface and creative content, making it easy to create engaging videos. The platform has attracted over 60 million registered users globally, demonstrating its significant impact on the AI video generation market.
- MiniMax
- MiniMax is a company that has developed a video model called ""海螺"" (Hǎiluó). In the past six months, their video model has generated over 300 million videos globally, showcasing its significant activity within the AI video generation space.
- June 2023:
- Kuaishou launched its large video model Kolin, based on a DiT architecture similar to Sora’s.
- Early 2024:
- OpenAI’s video model Sora was revealed to the tech world (not yet released to users).
- April 2025:
- Kuaishou elevated the importance of its video large model business and established the Keling AI Business Unit.
- April 2025:
- Keling registered monthly payments exceeding 100 million yuan.
- May 2025:
- Keling registered monthly payments exceeding 100 million yuan.
- May 20, 2025:
- Google unveiled its AI video model Veo 3.
- June 11, 2025:
- ByteDance unveiled its video generation model Seedance 1.0 Pro.
- By July 2025:
- Hai Luo, video model from MiniMax, surpassed 300 million cumulative videos generated globally in just the past six months.
- July 27, 2025:
- Kuaishou disclosed that more than 45 million creators are using Kling, generating over 200 million videos and 400 million images.
- 2025:
- World Artificial Intelligence Conference (WAIC) was held in Shanghai, where Wan Pengfei, Head of Visual Generation and Interaction Center at Kuaishou, was interviewed by Caixin.
- PODCAST
- MOST POPULAR