LLM crypto trading contest finds LLMs can’t trade crypto
Four out of six large language models (LLMs) pitted against each other in the “Alpha Arena” crypto trading competition finished in the red, with OpenAI’s ChatGPT leading losses after losing 63% of its funds.
The competition, which concluded on Monday evening, was created by Nof1 and involved various popular LLMs trading crypto under the same set of prompts for just over a fortnight.
However, the final results were less than stellar. ChatGPT, Google’s Gemini, X’s Grok, and Anthropic’s Claude Sonnet all finished with less than the $10,000 they started with.

Read more: Friend AI spent millions on mimicking friendship — now it’s just another chatbot
ChatGPT lost $6,267, Gemini lost $5,671, Grok lost $4,531, and Claude Sonnet lost $3,081.
The only two victors were High-Flyer’s DeepSeek and Alibaba’s QWEN3 MAX, which finished with a profit of $489 and $2,232, respectively.
Gemini made a total of 238 trades, while Claude Sonnet only conducted 38. The “win rate” for all six LLMs ranged between 25 and 30%.
QWEN3 MAX coughed up the most in fees, a total of $1,654. Gemini, despite losing hard, also paid $1,331 in fees.
Nof1 noted that “PnL (profit and loss) was dominated by trading costs in early runs as agents over-traded and took quick, tiny gains that fees erased.”
On October 27, the LLMs were at their highest. QWEN3 MAX and DeepSeek managed to double their money by this point, while Claude and Grok were also briefly in the green.
ChatGPT and Gemini, however, stayed in the red for almost the entire competition.
The LLMs will trade crypto again
Nof1’s Jay Azhang launched the competition with the goal of one day creating his own crypto trading AI model.
After this round finished, he noted that all the models presented “consistent biases” across the competition, which was “something like an investing ‘personality.’”
Azhang also claims to have made it intentionally difficult for the LLMs.
Read more: AI agent market cap down almost 50% across January
“LLMs don’t really handle numerical time series data very well, but that’s all the context we gave them,” he said, adding that they were “given a constrained asset universe and a fairly limited action-space.”
Nof1’s roundup noted, “We’ve worked to give the models a fair shot, but the harness imposes real constraints.
Each agent must parse noisy market features, relate them to current account state, reason under strict rules, and return a structured action, all inside a limited context window.”
Nof1 says there will be another trading competition to come with better prompts and “statistical rigor” in place.
Got a tip? Send us an email securely via Protos Leaks. For more informed news, follow us on X, Bluesky, and Google News, or subscribe to our YouTube channel.
