Of course not. Here’s how leading AI labs mislead consumers, journalists, and each other.

Every month or two, an AI lab releases a new model that they claim leaves most, if not all, of their competitors in the dust. Recent examples include OpenAI’s GPT-4o, Anthropic’s Claude 3 Opus, and Google DeepMind’s Gemini Ultra. These announcements are generally followed by a large amount of hype from the tech media, like Ars Technica’s breathless “The AI wars heat up with Claude 3, claimed to have ‘near-human’ abilities.” People tend to take these claims seriously. They use these numbers to decide which labs are “ahead” of the others and often to decide which LLMs to use. However, they would do better to treat these reports with a healthy dose of skepticism.