AI mind games: Google squares up to GTP-4

Developers of large models say Gemini has already demonstrated text capabilities comparable to GPT-4’s.

Photo by Kuang Da

By LI Jingya

 

It’s been nearly a week since Google introduced Gemini to the world and AI companies are busily exploring the power of the large model.

Unlike most previous large models, Gemini bypasses the text aspect and relies on sound and vision to understand the world. Generative AI companies believe that Gemini’s most significant feature is its multi-modal nature.

Native-speaker advantages

“In theory, native multi-modal models perform better than “patched” models because the latter is prone to bottlenecks during training,” said the head of Recurrent AI Chen Yujun.

Even though Ultra, the largest of the Gemini series, is still under wraps, people are already making favorable comparisons to GPT-4, beloved of Microsoft.

In demonstration videos, Gemini seems able to observe humans in real-time and respond appropriately. Gemini has strong understanding, reasoning, creativity, and interaction capabilities, more than OpenAI currently offers.

Three months ago, OpenAI released GPT-4, but its performance was disappointing, and it relies on collaboration with other models. These issues apart, Gemini probably hasn’t completely overtaken GPT-4 yet. GPT-4 and Gemini have not met head-to-head in text generation, GTP’s forte.

Gemini significantly outperformed GPT-4 in image searches. Zhuiyi AI’s Liu Yunfeng believes that Google’s search business naturally has better data in text, images and other modalities.

Pointing the way

Any move by Google in AI takes the entire industry along with it. However, before the release of Gemini, the trend toward multi-mode was already clear.

As early as March, when GPT-4 was released, OpenAI intended to integrate multi-modal capabilities in this iteration. Various multi-modal products have been appearing since September.

Multi-mode models are a clear development direction and not just because of Google. Still, the arrival of Gemini will stimulate companies to accelerate their research and development.

Meanwhile in China

In China, Baidu’s Ernie Bot 4.0 has made significant progress in cross-modal text-image understanding. Zhipu AI, the highest publicly-funded large model startup in China, has a competitive advantage in the visual domain through its generative AI assistant Zhipu Qingyan.

来源:界面新闻

广告等商务合作,请点击这里

未经正式授权严禁转载本文,侵权必究。

打开界面新闻APP,查看原文
界面新闻
打开界面新闻,查看更多专业报道

热门评论

打开APP,查看全部评论,抢神评席位

热门推荐

    下载界面APP 订阅更多品牌栏目
      界面新闻
      界面新闻
      只服务于独立思考的人群
      打开