AI mind games: Google squares up to GTP-4

By LI Jingya

It’s been nearly a week since Google introduced Gemini to the world and AI companies are busily exploring the power of the large model.

Unlike most previous large models, Gemini bypasses the text aspect and relies on sound and vision to understand the world. Generative AI companies believe that Gemini’s most significant feature is its multi-modal nature.

Native-speaker advantages

“In theory, native multi-modal models perform better than “patched” models because the latter is prone to bottlenecks during training,” said the head of Recurrent AI Chen Yujun.

Even though Ultra, the largest of the Gemini series, is still under wraps, people are already making favorable comparisons to GPT-4, beloved of Microsoft.

In demonstration videos, Gemini seems able to observe humans in real-time and respond appropriately. Gemini has strong understanding, reasoning, creativity, and interaction capabilities, more than OpenAI currently offers.

Three months ago, OpenAI released GPT-4, but its performance was disappointing, and it relies on collaboration with other models. These issues apart, Gemini probably hasn’t completely overtaken GPT-4 yet. GPT-4 and Gemini have not met head-to-head in text generation, GTP’s forte.

Gemini significantly outperformed GPT-4 in image searches. Zhuiyi AI’s Liu Yunfeng believes that Google’s search business naturally has better data in text, images and other modalities.

Pointing the way

Any move by Google in AI takes the entire industry along with it. However, before the release of Gemini, the trend toward multi-mode was already clear.

As early as March, when GPT-4 was released, OpenAI intended to integrate multi-modal capabilities in this iteration. Various multi-modal products have been appearing since September.

Multi-mode models are a clear development direction and not just because of Google. Still, the arrival of Gemini will stimulate companies to accelerate their research and development.

Meanwhile in China

In China, Baidu’s Ernie Bot 4.0 has made significant progress in cross-modal text-image understanding. Zhipu AI, the highest publicly-funded large model startup in China, has a competitive advantage in the visual domain through its generative AI assistant Zhipu Qingyan.

AI mind games: Google squares up to GTP-4

Native-speaker advantages

Pointing the way

Meanwhile in China

热门评论

热门推荐

AI mind games: Google squares up to GTP-4

Native-speaker advantages

Pointing the way

Meanwhile in China

相关推荐

热门评论

热门推荐