Caixin
Mar 21, 2024 04:08 PM
CAIXIN WEEKLY SNEAK PEEK

Caixin Weekly | Sora Takes Another Step Forward (AI Translation)

00:00
00:00/00:00
Listen to this article 1x
This article was translated from Chinese using AI. The translation may contain inaccuracies. Click the button on the right to hide or reveal the original version.
Sora团队自称仍是一个研究项目,还不是一个产品,短期内没有开放时间表。经历了过去一年的大语言模型热,投资者和创业者都已明白,产品要有商业落地才能活下去。图:Jonathan Raa/视觉中国
Sora团队自称仍是一个研究项目,还不是一个产品,短期内没有开放时间表。经历了过去一年的大语言模型热,投资者和创业者都已明白,产品要有商业落地才能活下去。图:Jonathan Raa/视觉中国

文|财新周刊 杜知航 关聪

By Caixin Weekly's Du Zhihang, Guan Cong

  文|财新周刊 杜知航 关聪

By Caixin Weekly's Du Zhihang, Guan Cong

  “爱诗科技将集中人力和资源,在三到六个月内赶超Sora目前水平。”

"Aishi Technology will concentrate manpower and resources to surpass the current level of Sora within three to six months."

  “生数科技今年年内能达到OpenAI的Sora目前版本的效果。”

"Shengshu Technology will be able to achieve the current version effect of OpenAI's Sora within this year."

loadingImg
You've accessed an article available only to subscribers
VIEW OPTIONS
Disclaimer
Caixin is acclaimed for its high-quality, investigative journalism. This section offers you a glimpse into Caixin’s flagship Chinese-language magazine, Caixin Weekly, via AI translation. The English translation may contain inaccuracies.
Share this article
Open WeChat and scan the QR code
Digest Hub Back
Caixin Weekly | Sora Takes Another Step Forward (AI Translation)
Three Things to Know
  • Two Chinese AI startups, Aishi Technology and Shengshu Technology, announced financing rounds aimed at catching up with OpenAI's Sora, a generative video large model product released in March 2024. Sora can generate videos from text prompts in under a minute, showcasing advanced AI video technology.
  • The technological progression of OpenAI from ChatGPT to Sora represents significant advancements in understanding natural language, generating images from text, and developing text-to-video models. This has commercial applications and leapfrog value in technology growth.
  • Despite the enthusiasm for AI-generated content, Chinese companies face challenges in computing power and regulatory aspects to match Sora's capabilities. Meanwhile, industries like film, television, and gaming are exploring the potential impacts of AI on production efficiency and content creation.
AI generated, for reference only
Subscribe to unlock Digest Hub
SUBSCRIBE NOW
A Deeper Dive

In the wake of OpenAI's release of Sora, a groundbreaking artificial intelligence (AI) generative video model, on March 11, Chinese AI startups Aishi Technology and Shengshu Technology have announced significant financing rounds aimed at developing technologies to rival or surpass Sora within months. This development underscores the rapid advancement in AI capabilities, transitioning from text and image generation to complex video creation [para. 1].

OpenAI's journey began with ChatGPT in November 2022, evolving through various iterations of its neural network architecture to achieve remarkable milestones in natural language understanding, image generation from text, and now video generation with Sora. The ability of Sora to generate videos from text prompts in under a minute represents a significant leap forward in both technology and potential commercial applications [para. 2].

The response to Sora has been enthusiastic. Since its launch on TikTok on February 15, 2024, OpenAI's account has amassed over 230,000 followers and received more than 1.4 million likes, highlighting the public's fascination with AI-generated content [para. 3]. This surge in interest mirrors the excitement around ChatGPT during China's Spring Festival in 2023, prompting companies like Aochuang Guangnian to reevaluate their product development timelines due to the rapid advancements in generative video technology [para. 4].

Despite this enthusiasm, major Chinese internet companies have adopted a more cautious approach towards investing in technologies similar to Sora. The primary bottleneck for domestic firms is computing power; no company has yet achieved the scale necessary to compete directly with Sora. This challenge is compounded by recent U.S. export controls on semiconductors to China and stringent domestic regulations [para. 5][para. 6].

Baidu CEO Robin Li emphasized an "application-driven" approach for developing video generation models during an earnings call on February 28. Market observers note that short video platforms could significantly benefit from generative videos due to their inherent demand for fresh content [para. 7]. However, there remains a considerable gap between Chinese models' capabilities and those of leading international models like Sora [para. 8].

Sora is highly regarded by OpenAI as a step towards achieving Artificial General Intelligence (AGI), combining Transformer architecture with Diffusion models for enhanced video generation capabilities. Despite its potential, it remains a research project with no immediate plans for public release [para. 9]. The cost of training such advanced models is substantial; estimates suggest that training costs could be in the tens of millions of dollars due to the immense computational resources required [para. 10].

The film, television, and gaming industries are closely monitoring developments around Sora and similar technologies. There is recognition that AI-generated content could significantly impact production processes and content creation efficiency. Some industry insiders foresee a future where AI might replace certain roles within these industries or at least augment existing processes [para. 11][para. 12].

Chinese startups are not deterred by these challenges; Aishi Technology aims to surpass Sora within six months focusing on consumer markets (2C), while Shengshu Technology completed a new round of financing worth hundreds of millions of yuan for developing native multimodal large models capable of generating images and videos from text [para. 13][para. 14]. These efforts highlight the ongoing race among Chinese companies to close the technological gap with global leaders like OpenAI.

The emergence of technologies like Sora has prompted reassessment among Chinese internet giants regarding their strategies for catching up with overseas advancements in AI-generated content. ByteDance appears most likely among domestic firms to develop a short video application akin to Sora but faces challenges related to computing power availability and regulatory constraints [para. 15][para. 16].

Overall, while there is optimism about bridging the technological divide with innovations like Sora within China's tech ecosystem, significant hurdles remain concerning computational resources, regulatory environment, and practical application scenarios that need addressing before these ambitious goals can be realized fully[para. 17][para. 18].

AI generated, for reference only
Who’s Who
Aishi Technology
爱仕科技
Summay: Aishike Technology, founded in April 2023, is a startup specializing in the development of AI video mega-models and applications. The company has recently completed a Series A1 financing round worth 100 million yuan, led by Dacheng Capital, with Guangyuan Capital as the exclusive financial advisor. Aishike Technology's international version, PixVerse, was officially launched in January 2024, offering text-to-video functionality with generated video lengths of several seconds. The Chinese version, Aishike Video Mega Model, has also completed its registration and launched an internal test on March 11. The company's strategy is primarily focused on the consumer market (2C), collecting extensive feedback from users both domestically and internationally to better iterate the underlying model based on user experience. Aishike Technology aims to provide useful tools for content producers at the production end and actively explore possibilities at the content consumption end, attempting to create a platform for AI-native video production and consumption.
Shengshu Technology
盛殊科技
Summay: Shengshu Technology, a startup established in March 2023, is focused on the research and development of native multimodal large models for images, 3D, and video. The company's core team includes members from Tsinghua University's Institute for Artificial Intelligence, with Zhu Jun, the Vice President of the institute, serving as the Chief Scientist. Shengshu Technology's product features include text generation from images, joint image-text generation, image-text rewriting, and converting flat images into three-dimensional content observable from multiple angles. The company recently announced the completion of a new round of financing worth hundreds of millions of yuan, led by Qiming Venture Partners, with participation from DTA Capital, Hongfu Houtte, Zhipu AI, existing shareholder BV Baidu Ventures, and Zhuoyuan Asia.
ByteDance
字节跳动
Summay: Based on the article, here are some key points about ByteDance:- ByteDance has been using interfaces of mature large models, such as OpenAI's GPT, for products outside China, but relies on its self-developed models for products within China.- In 2023, ByteDance consolidated a clear division for large-scale models, named "Flow", and began allocating resources and manpower for technology productization in the second half of 2023.- ByteDance's self-developed large model, "Yunque", has been applied to the conversational robot "Dou Bao".- ByteDance is pursuing two strategic paths for AIGC product development: developing AI-native products like DouBao and Cici, and using AI to enhance existing business operations, such as improving video editing tools like JianYing.- ByteDance has been actively recruiting AI talent in Silicon Valley recently.- ByteDance's large model focuses on two directions: text and multimodalities including image and video generation.- ByteDance's CEO Liang Rubo criticized the company for not being as sensitive to opportunities as startups and for discussing GPT only in 2023, while more successful large model startups were established between 2018-2021.- Due to OpenAI's restrictions, ByteDance relies on its self-developed models for products within China, and its account was suspended by OpenAI in December 2023.In summary, ByteDance is actively developing its own large models and AIGC products, but faces challenges in catching up with leading models like Sora.
Baidu
百度
Summay: Based on the article, here are some key points about Baidu:- Baidu has developed its own language-based large model called "Wenxin Yiyan" and does not currently have a clear objective for developing generative video content.- Baidu CEO Robin Li discussed the possibility of developing a video generation large model during the company's earnings call on February 28, 2024, and mentioned that an "application-driven" approach would be adopted, allowing users and customers to inform the company on what aspects of the large model should be improved.- Baidu is taking a more measured approach towards Sora compared to its response to ChatGPT, reflecting an awareness of the technological gap.- Baidu has established its own AI lab dedicated to the gaming industry, focusing on the development of small models and vertical applications.- Baidu has launched AI-generated video products, but these are not directly related to its AI large-scale model.- An insider at Baidu pointed out that products like Sora require extremely high computational power, and without clear application prospects, the company is unlikely to immediately follow the trend set by Sora.
Tencent
腾讯
Summay: Based on the article, here are some key points about Tencent:- Tencent has established an AI lab dedicated to the gaming industry, focusing on the development of small models and vertical applications, in collaboration with its gaming studios.- Tencent has launched AI-generated video products, including DynamiCrafter and VideoCrafter2, which are still in the research stage and not directly related to Tencent's large-scale models.- In March 2024, Tencent introduced a new image-to-video model called "Follow-Your-Click," developed in collaboration with Tsinghua University and the Hong Kong University of Science and Technology. This model enables animation of static areas in images with just a click and a few prompt words.- Tencent's AI Lab is known to be the only profit-making team within its Technology Engineering Group, primarily serving Tencent's gaming studios and receiving a share of gaming revenue.- Overall, Tencent has been actively exploring AI applications in gaming and video, but its large-scale model development lags behind leaders like OpenAI. The company is focusing more on developing specialized models for gaming and other verticals.
Alibaba
阿里巴巴
Summay: Based on the article, Alibaba Cloud has launched several AI audio and video products, including:- Qwen-VL, a model with 7 billion parameters, developed based on the foundation of Tongyi Qianwen.- An AI audio generation model.- Animate Anyone, a video technology development tool that can generate animated videos of characters from a single static image using skeletal animation.- Outfit Anyway, a virtual dressing tool that can be integrated into Alibaba's e-commerce platforms.- emo, a tool that can create dynamic images with sound based on an image and audio clip.In summary, Alibaba Cloud has launched a series of AI audio and video products, including models for audio and video generation, as well as tools for creating animated videos and virtual try-ons. These products leverage AI technology to enable new forms of content creation and interaction.
AI generated, for reference only
PODCAST