Even the Worst Version of Claude AI Is Better Than GPT 3.5, Researchers Say

06.10.2023

The AI industry is witnessing a riveting competition between the notable ChatGPT and Claude AI models. The Large Model Systems Organization (LMSO), responsible for creating the Chatbot Arena and the renowned Vicuna Model, has just updated their Chatbot Arena Leaderboard, reflecting how each AI chatbot measures up to its competitors. Turns out Anthropic is giving OpenAI a run for its money, even while its models are still free to use.

GPT-4, the powerhouse behind ChatGPT Plus and Bing AI, reigns supreme with the highest score, setting the gold standard for Large Language Models (LLMs). But as we move down the leaderboard, an unexpected underdog story unfolds. Anthropic’s Claude models — Claude 1, Claude 2, and Claude Instant — all outperform GPT-3.5, the engine that powers the free version of ChatGPT. This implies that every Large Language Model developed by Anthropic can outclass the free version of ChatGPT.

The meticulous ranking system by the LMSO provided insight into the performance metrics of these models. According to the leaderboard, GPT-4 holds an Arena Elo Rating of 1181, significantly leading the chart, while the Claude models follow closely with ratings ranging from 1119 to 1155. GPT-3.5, on the other hand, lags with a rating of 1115.

To rank the models, the LMSO makes them “battle” in matches with similar prompts. The model with the best answer wins and the other loses. Users decide who wins based on their own preferences, but they never get to know which models are competing.

Image: LMSO

As Decrypt previously reported, the difference in token processing capabilities between ChatGPT Plus and Claude Pro, although not a factor in the LMSO ranking, is also a major advantage that Claude models have over GPT.

“Claude Pro, based on the Claude 2 LLM, can process up to 100K tokens of information, while ChatGPT Plus, powered by the GPT-4 LLM, handles 8,192 tokens,” we recalled. This differential in token processing ability underscores the edge Claude models hold in managing extensive contextual inputs, which is crucial for a nuanced and enriched user experience.

Moreover, when handling long prompts, Claude 2 has shown superiority over GPT, handling prompts of larger magnitude more efficiently. However, when prompts are comparable, Claude 1 and Claude Instant provide similar or slightly better results to GPT-3.5, showcasing the competitive nature of these models. With Claude’s context capabilities, a poor initial answer can be dramatically improved with a more refined, larger and richer prompt.

Open-source models are not far behind in this race.

WizardLM, a model trained on Meta’s LlaMA-2 with 70 billion parameters, stands out as the best open-source LLM. Following close are Vicuna 33B and the original LlaMA-2, released by Meta.

🎉The @lmsysorg just updated the Chatbot Arena Leaderboard!

Our WizardLM-70B is now the🥇Top-1 open-source model on both ⚔️Arena Elo and 📈MT-bench.

❤️Main Contributors:@CanXu20 @victorsungo_ai@ChiYeung_Law@hpluo12@tangmensan

Leaderboard: https://t.co/1gkZKGVutQ
Model… pic.twitter.com/bsJ0jv2i7I

— WizardLM (@WizardLM_AI) October 5, 2023

Open-source models play an important role in the development of the AI space for different reasons. They can be run locally, which gives users the opportunity to finetune them and engages the community in a collective effort to perfect the model. They are also cheaper to run due to their licenses, which is why the space has dozens of open-source LLMs and only a handful of proprietary models.

But the game of AI chatbots isn’t solely about numbers. It’s about real-world implications.

As chatbots become integral in various sectors from customer service to personal assistants, their efficacy, adaptability, and accuracy become paramount. With Claude models ranking higher than GPT-3.5, businesses and individual users might find themselves at a crossroads, evaluating which model aligns best with their needs. Decrypt has prepared two guides to help you decide what model suits you best.

For the uninitiated, this might seem like just another leaderboard update. But for those closely watching the AI industry, it’s a testament to how fierce the competition is and how swiftly the tides can turn. And as for the rest of us who sit in between those two camps, it’s a reminder that in the AI world, today’s most popular model could fall to the most efficient.

Source

Click to rate this post!

[Total: 0 Average: 0]

06.10.2023

Even the Worst Version of Claude AI Is Better Than GPT 3.5, Researchers Say

Read Next

CZ, Former CEO of Binance, Speaks Candidly After Bitcoin’s Big Rise

Crypto Money Won the Election—And It Could Just Be Getting Started

Asian crypto traders profit from Trump’s win, China’s 2025 CBDC deadline: Asia Express

Telegram is becoming a one-stop app like China’s WeChat — Bitget CEO

Anthropic, Palantir follows Meta’s lead taking AI to war

Kalshi rolls out political betting contracts for Trump presidency

Load variability is key to AI energy infrastructure — TeraWulf CTO

Trump adviser says Powell will likely keep Fed role until 2026

Donald Trump More Likely to Pardon Jan. 6 Protestors Than Silk Road Founder: Polymarket

Breaking: Boxing Legend Mike Tyson Becomes NAGA Group’s Brand Ambassador

CZ, Former CEO of Binance, Speaks Candidly After Bitcoin’s Big Rise

Crypto Money Won the Election—And It Could Just Be Getting Started

Asian crypto traders profit from Trump’s win, China’s 2025 CBDC deadline: Asia Express

Telegram is becoming a one-stop app like China’s WeChat — Bitget CEO

Anthropic, Palantir follows Meta’s lead taking AI to war

Kalshi rolls out political betting contracts for Trump presidency

Load variability is key to AI energy infrastructure — TeraWulf CTO

Trump adviser says Powell will likely keep Fed role until 2026

Donald Trump More Likely to Pardon Jan. 6 Protestors Than Silk Road Founder: Polymarket

Breaking: Boxing Legend Mike Tyson Becomes NAGA Group’s Brand Ambassador

Leave a Reply Cancel reply

North Korean Hackers Target Crypto Firms in ‘Hidden Risk’ Campaign

Private key and phishing “most prevalent,” crypto scams security firm says

Grayscale GBTC Sees $7.3 Million Net Inflow as Mini BTC Surges with $20.4 Million

Will Crypto Markets See New Peaks as $3.7B Bitcoin Options Expire Today?

BTC price sets fresh all-time high near $77K amid ‘long squeeze’ fears

North Korean Hackers Target Crypto Firms in ‘Hidden Risk’ Campaign

Private key and phishing “most prevalent,” crypto scams security firm says

Grayscale GBTC Sees $7.3 Million Net Inflow as Mini BTC Surges with $20.4 Million

Will Crypto Markets See New Peaks as $3.7B Bitcoin Options Expire Today?

BTC price sets fresh all-time high near $77K amid ‘long squeeze’ fears

Bitcoin remains ‘niche phenomenon,’ Swiss central bank says

FED’s Critical Interest Rate Decision is 1 Hour Away – Here’s What to Expect According to the Latest Data

BREAKING: Critical FED Interest Rate Decision Announced

Dapps’ revenue hits $164M in October amid growing adoption

Binance Hits $8.3B Open Interest High—Here’s What It Signals For the Crypto Market