Every week someone in a training session asks me: “Should I be using ChatGPT or Claude?” And every week my answer is the same: it depends on what you’re trying to do.
There is no single best LLM. There’s the best for coding, the best for long-document analysis, the best for quick everyday tasks, and the best if you don’t want to pay anything. The landscape moves fast — models that were cutting-edge six months ago are now mid-tier — so I update this page regularly.
Here’s my honest take on the models worth knowing about in 2026, based on what I actually see working in training sessions with real professionals.
If You Only Have 30 Seconds
Most people don’t need to read this whole page. Here’s what I tell clients who just want an answer:
- For writing — Claude. It follows nuanced instructions better than anything else I’ve tested. It doesn’t add filler, it doesn’t ignore half your brief, and the tone control is noticeably better.
- For coding — Claude again. I’ve watched developers switch mid-session after seeing the difference. It holds context across long files and catches edge cases the others miss.
- For everyday questions and quick research — Gemini or Grok. Gemini is wired into Google’s index, so it’s fast and current. Grok pulls from X/Twitter in real-time, which is useful for anything news-related.
- For deep research — ChatGPT’s deep research mode or Perplexity. Both produce proper cited reports. ChatGPT goes deeper; Perplexity is faster and easier.
- For image and video generation — ChatGPT or Gemini. Claude and Perplexity don’t do this at all.
- If you’re already in Microsoft 365 — Copilot is built into Word, Excel, Outlook, and Teams, so the integration is unmatched. But be aware: Microsoft doesn’t have a frontier model. The underlying AI is behind Claude, Gemini, and even Grok on the leaderboards right now. And tools like Claude can actually work with Microsoft apps in some ways better than Copilot does, through MCP and Cowork. So the integration advantage is narrowing.
- If you want free — Gemini gives you the most on the free tier. Google’s been generous. DeepSeek R1 is also free and surprisingly capable for reasoning tasks.
- If you want one subscription to try everything — Start with ChatLLMs. It gives you access to all the frontier models (GPT-5.4, Claude, Gemini, Grok) and the best open-source ones in a single interface for $10/month. It’s the best way to figure out which LLM suits you before committing to one ecosystem. I recommend it to most of my clients as a starting point.
Once you’ve found the model you prefer, move to that platform’s native subscription. The native apps tend to perform slightly better with their own models, and you get the full ecosystem: integrations, agents, desktop apps, the lot.

Platform Feature Comparison
How the seven main AI platforms compare across key features — based on their current consumer offerings.
| Feature | ChatGPTOpenAI | ClaudeAnthropic | GeminiGoogle | GrokxAI | CopilotMicrosoft | ManusManus AI | PerplexityPerplexity |
|---|---|---|---|---|---|---|---|
| Everyday answers | ChatGPT: Available | Claude: Available | Gemini: Best in class | Grok: Best in class | Copilot: Available | Manus: Available | Perplexity: Available |
| Writing | ChatGPT: Available | Claude: Best in class | Gemini: Available | Grok: Available | Copilot: Available | Manus: Available | Perplexity: Available |
| Coding | ChatGPT: Available | Claude: Best in class | Gemini: Available | Grok: Available | Copilot: Available | Manus: Available | Perplexity: Available |
| Thinking | ChatGPT: Best in class | Claude: Best in class | Gemini: Available | Grok: Available | Copilot: Available | Manus: Available | Perplexity: Available |
| Deep research | ChatGPT: Best in class | Claude: Available | Gemini: Available | Grok: Available | Copilot: Available | Manus: Available | Perplexity: Best in class |
| Web search | ChatGPT: Available | Claude: Available | Gemini: Available | Grok: Best in class | Copilot: Available | Manus: Available | Perplexity: Available |
| Voice chat | ChatGPT: Best in class | Claude: Available | Gemini: Available | Grok: Available | Copilot: Available | Manus: Not available | Perplexity: Available |
| Image gen | ChatGPT: Best in class | Claude: Not available | Gemini: Best in class | Grok: Available | Copilot: Available | Manus: Available | Perplexity: Not available |
| Video gen | ChatGPT: Available | Claude: Not available | Gemini: Best in class | Grok: Available | Copilot: Not available | Manus: Not available | Perplexity: Not available |
| Live camera | ChatGPT: Available | Claude: Not available | Gemini: Best in class | Grok: Available | Copilot: Not available | Manus: Not available | Perplexity: Not available |
| Use Desktop | ChatGPT: Best in class | Claude: Available | Gemini: Available | Grok: Not available | Copilot: Best in class | Manus: Best in class | Perplexity: Available |
| Agents | ChatGPT: Available | Claude: Available | Gemini: Available | Grok: Best in class | Copilot: Available | Manus: Best in class | Perplexity: Available |
What the table doesn’t tell you
A checkmark means the feature exists. It doesn’t mean it’s good.
ChatGPT has voice chat and it’s the best — genuinely conversational, low-latency, handles interruptions. Claude has voice chat too, but it’s newer and less polished. Perplexity has it, but it’s basically text-to-speech over search results.
Same with “everyday answers.” Gemini and Grok both get the star, but for different reasons. Gemini is fast and pulls from Google’s search index, so it’s great for factual lookups. Grok is plugged into X, so it’s better for “what’s happening right now” questions.
The features that matter most to professionals I work with? Writing quality, coding ability, and deep research. Everything else is nice to have.
What Are AI Agents? (And Why They’re on This List)
You’ll see “Agents” in the comparison table above. It’s worth explaining what this means, because it’s the biggest shift in how these tools work right now.
An AI agent is a system that can use tools, make decisions, and keep working towards a goal without you supervising every step. Instead of you typing a prompt, reading the answer, typing another prompt, it goes off and does the work: browsing websites, writing code, creating files, calling APIs.
The simple version: perception → reasoning → action → repeat.
You give it a goal. It figures out the steps. It executes them. It keeps going until it’s done or gets stuck.
This matters because most white-collar work happens in browser apps — Salesforce, Jira, Google Docs, email. As soon as agents can operate those tools as well as (or better than) a human, the economics of that work change fundamentally. We’re not there yet for everything, but we’re getting closer every month.
Here’s what agent capabilities look like across platforms right now:
Manus is the most agent-forward. It opens a browser, uses a terminal, builds files, and completes multi-step tasks autonomously. In sessions, most people are blown away the first time they see an LLM using a terminal. It’s still rough around the edges, but it’s improving fast.
ChatGPT has Operator and custom GPTs that can browse and take actions. Claude has computer use and Cowork mode. Gemini has agent-like features through extensions and Google Workspace integration. Copilot has agents built into Microsoft 365 that can act across Word, Excel, Outlook, and Teams. Grok is experimenting with multi-agent approaches where specialised agents collaborate before giving you an answer.
This is moving fast. I expect this section to look very different in six months.
What Each Platform Is Actually Best At
ChatGPT (OpenAI)
The Swiss Army knife. It does everything — text, images, voice, video, code, research, desktop control. No other platform matches it for breadth. It’s still the best for voice, deep research, and image generation. If you want one tool that covers 80% of use cases, this is a solid all-rounder.
Here’s the thing people don’t realise: ChatGPT’s models are falling behind. On the Arena leaderboard, GPT-5.4 is no longer the top model. Claude and Gemini have overtaken it on writing, coding, and reasoning benchmarks. But because it’s ChatGPT — because it’s the name everyone knows — it’s still the most popular. Brand recognition is doing a lot of heavy lifting right now.
Writing quality is the other weak point. ChatGPT has a recognisable “voice” that’s hard to shake. It loves bullet points, bold text, and phrases like “Here are some key considerations.” In training sessions, I can usually spot ChatGPT output within two sentences.
Claude (Anthropic)
The best model in the world right now. Claude sits at or near the top of the Arena leaderboard for text, and it’s the clear leader on the coding leaderboard. It follows complex instructions more faithfully than anything else I’ve tested. It doesn’t add unnecessary qualifiers, it doesn’t pad, and it respects tone.
Anthropic is turning into a $100 billion company on the back of this. They’re cleaning up on the enterprise front, and there’s a pattern I keep seeing in sessions: once people start using Claude properly, they don’t go back to anything else. It’s sticky in a way ChatGPT isn’t.
It’s also the strongest coder right now. I’ve run side-by-side tests in sessions — Claude catches things GPT misses, especially in longer codebases.
The downside: no image generation, no video, no live camera. If you need those, you’ll need a second tool. But for the actual thinking and working part of AI, nothing else is as good right now.
Gemini (Google)
The dark horse. Gemini has improved more than any other platform in the last year. The 1M+ context window is real and usable — you can drop an entire codebase or a 500-page document into it and get coherent answers. No other consumer model offers that.
It’s also the most generous free tier. Google clearly wants market share.
Where it’s ahead of everyone: image and video generation (Veo is impressive), live camera input, and deep integration with Google Workspace. If your team lives in Google Docs and Gmail, Gemini is the natural fit.
Grok (xAI)
The real-time model. Grok’s main advantage is its connection to X/Twitter. For anything happening right now — news, trends, public sentiment — it’s faster than the others because it’s pulling from live social data. It gives you the best window into what humans are actually thinking and talking about right now.
The SuperGrok tier at $30/month is pricier than the competition, and the heavy tier at $300/month is clearly aimed at developers, not regular users.
Grok’s writing and coding are decent but not class-leading. It’s a supplementary tool for most people, not a primary one. But for real-time social intelligence, nothing else comes close.
Microsoft Copilot
The enterprise play. Copilot isn’t competing on raw model quality — it’s competing on integration. If your company runs on Microsoft 365, Copilot is embedded directly into Word, Excel, PowerPoint, Outlook, and Teams. No other AI tool can summarise a Teams meeting, draft a follow-up email in Outlook, and update a spreadsheet in Excel without leaving the apps you already use.
The free tier is decent for basic chat. Copilot Pro at $20/month adds priority access and integration with your M365 apps. For businesses, it’s $30/user/month on top of your existing M365 licence.
Here’s what I’m seeing in practice though: the underlying model is falling behind. Microsoft doesn’t have a frontier model of its own — it relies on OpenAI’s GPT under the hood, which as I mentioned above is no longer top of the leaderboards. A lot of companies are being pushed into Copilot by their Microsoft licensing deals, rolling it out to their workforce, and then being disappointed when adoption stays low. People try it, find the output quality lacking compared to Claude or even ChatGPT itself, and quietly go back to whatever they were using before.
The integration is real. But a well-integrated mediocre model is still a mediocre model.
Manus (Manus AI)
The autonomous agent. Manus is different from the others — it’s less of a chatbot and more of a worker. You give it a task (“research these 20 companies and build a spreadsheet comparing their pricing”) and it goes off and does it, using multiple tools: browsing, coding, file creation.
In sessions, most people are blown away the first time they see Manus open a terminal and start executing commands. It’s one thing to ask a chatbot a question. It’s another to watch it actually do the work — browsing websites, writing scripts, building output files. It’s still quite new, but it’s getting better every month.
The credit-based pricing is the catch. You don’t pay a flat monthly rate for unlimited use — you burn credits per task, and complex tasks burn more. This makes costs unpredictable, which frustrates people.
As soon as agents like Manus become as proficient as a human at operating a terminal and browser, this is where you start seeing real disruption to white-collar work. Lots of companies run their operations through browser apps — Salesforce, Jira, Google Workspace. Now you can have agents orchestrate updates for you.
Perplexity
The research assistant. Perplexity isn’t trying to be everything — it’s built for search. You ask a question, it searches the web, reads the sources, and gives you an answer with citations. Think of it less as a chatbot and more as a research analyst.
The Pro tier at $20/month is worth it if you do serious research. It produces what I call “McKinsey-style reports” — structured, sourced, professional. For quick factual questions, it’s often better than asking ChatGPT because it always cites where the information came from.
Integrations: Which Platforms Connect to Your Tools?
This is something people overlook when choosing an LLM. The model quality matters, but so does whether it plugs into the tools you already use.
| Integration | ChatGPT | Claude | Gemini | Grok | Copilot | Manus | Perplexity |
|---|---|---|---|---|---|---|---|
| Gmail / Email | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Native/deep integration | Grok: Not supported | Copilot: Supported via plugin or connector | Manus: Native/deep integration | Perplexity: Native/deep integration |
| Calendar | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Native/deep integration | Grok: Not supported | Copilot: Supported via plugin or connector | Manus: Native/deep integration | Perplexity: Native/deep integration |
| Google Docs / Sheets | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Native/deep integration | Grok: Not supported | Copilot: Not supported | Manus: Supported via plugin or connector | Perplexity: Supported via plugin or connector |
| Word / Excel / PPT | ChatGPT: Supported via plugin or connector | Claude: Supported via plugin or connector | Gemini: Not supported | Grok: Not supported | Copilot: Native/deep integration | Manus: Supported via plugin or connector | Perplexity: Not supported |
| Slack | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Supported via plugin or connector | Grok: Not supported | Copilot: Supported via plugin or connector | Manus: Native/deep integration | Perplexity: Supported via plugin or connector |
| Project tools (Jira, Linear, Asana) | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Supported via plugin or connector | Grok: Not supported | Copilot: Supported via plugin or connector | Manus: Supported via plugin or connector | Perplexity: Native/deep integration |
| Salesforce / CRM | ChatGPT: Supported via plugin or connector | Claude: Native/deep integration | Gemini: Supported via plugin or connector | Grok: Not supported | Copilot: Supported via plugin or connector | Manus: Supported via plugin or connector | Perplexity: Not supported |
| Code editors (VS Code, etc.) | ChatGPT: Native/deep integration | Claude: Supported via plugin or connector | Gemini: Supported via plugin or connector | Grok: Not supported | Copilot: Not supported | Manus: Not supported | Perplexity: Not supported |
| Custom API / MCP | ChatGPT: Native/deep integration | Claude: Native/deep integration | Gemini: Native/deep integration | Grok: Supported via plugin or connector | Copilot: Supported via plugin or connector | Manus: Native/deep integration | Perplexity: Supported via plugin or connector |
Why this matters
Native integrations are becoming more and more important as LLMs get more capable. For the first time, we’re able to do work across multiple SaaS systems and have them updated consistently. That should, in theory, bring up both productivity and knowledge for everyone.
There are two ways this works right now. The first is native integrations, where the platform connects directly to your tools through its own protocol. Gemini talks to Gmail, Calendar, and Drive natively. Copilot does the same across Outlook, Teams, Word, and Excel. These are the fastest and most reliable because the machine is using its own protocol.
The second is through the browser. Claude’s Cowork mode, for example, can use the browser on your machine — so it can operate Gmail, Calendar, Linear, and anything else you have open, directly. Manus does this too: it opens a browser, navigates to Jira, and clicks buttons like a human would. It’s more flexible, but it’s slower and occasionally fragile.
Then there’s MCP (Model Context Protocol), which is Claude’s approach to plugging into almost anything — Slack, Linear, GitHub, custom internal tools. It’s the most flexible option, but it requires some setup.
This is still fairly new. People are struggling to make all the integrations work together reliably. But the trajectory is clear: the LLM that connects best to your existing tools will be the one that saves you the most time.
What Does It Actually Cost?
All seven platforms have a free tier. Here's what you get if you pay:
| Platform | Free | Standard Tier | Premium Tier |
|---|---|---|---|
| ChatGPT | Limited GPT-5.2 | Plus — $20/mo (~£18) | Pro — $200/mo |
| Claude | Basic access | Pro — $20/mo (~£18) | Max — $100–200/mo |
| Gemini | Generous free tier | Pro — $19.99/mo | Ultra — ~$42/mo |
| Grok | ~10 prompts/2hrs | SuperGrok — $30/mo | Heavy — $300/mo |
| Copilot | Basic chat | Pro — $20/mo | M365 Copilot — $30/user/mo |
| Manus | Small credit pool | Starter — $39/mo | Pro — $199/mo |
| Perplexity | Limited Pro searches | Pro — $20/mo | Max — $200/mo |
Note: Most platforms charge in USD. UK prices vary depending on exchange rates and VAT. ChatGPT Plus and Claude Pro typically work out to around £18/month on a UK card.
My honest take on value: You must pay for the premium models. The difference between the free tier and a paid subscription is like the difference between a good intern and a senior employee. You would pay extra for someone who can think better and has more experience every single day of the week. It’s the same here.
ChatLLMs at $10/month is a good place to start — try all the models, find your favourite. But you must dive into a proper ecosystem when you’re ready. The native apps perform better, the integrations go deeper, and you get the full feature set.
If you’re stacking subscriptions, the most common combo I see with clients is Claude Pro + Perplexity Pro — roughly $40/month total. Claude for the actual work (writing, coding, analysis) and Perplexity for research. Add ChatGPT if you need image generation or voice.
Personally, I pay £90/month for Claude Max. I rarely hit the limits and it serves me perfectly well for building my AI training business and software for myself and our clients. For what it gives me, it’s absurdly good value.

Open-Source vs Closed-Source: Why It Matters
A year ago, there was a meaningful gap between open-source and proprietary models. That gap has effectively vanished in 2026. Open models like DeepSeek, Kimi K2, and LLaMA 4 now match or exceed older closed models on most benchmarks.
Why care? Three reasons: cost (open models are free to use), privacy (you can run them on your own hardware), and control (you can fine-tune them for your specific needs). If you’re a business handling sensitive data — legal firms, healthcare, financial services — the open-source option is worth serious consideration.
Most of the professionals I train won’t self-host anything. That’s fine. But if you’re a developer or you work in a regulated industry, knowing these exist matters.
Closed-Source (Paid) Models
| Model | Lab | Context Windowⓘ | Standout Strength |
|---|---|---|---|
| Claude Sonnet 4.6 | Anthropic | 200K | Best balance of speed and quality |
| Claude Opus 4.6 | Anthropic | 200K | Deepest reasoning, best for complex writing |
| GPT-5.4 | OpenAI | 1M | Broadest feature set, strongest multimodal |
| Grok-4.1 | xAI | 128K | Real-time data from X/Twitter |
| Gemini 3.0 Pro | 1M+ | Largest context window by far |
Open-Source Models
These are free to use and can be self-hosted, which matters for privacy-sensitive work:
| Model | Lab | Context Windowⓘ | Standout Strength |
|---|---|---|---|
| DeepSeek R1 | DeepSeek | 131K | Best overall performance, MIT license |
| Kimi K2 | Moonshot AI | 256K | Competitive with closed models on benchmarks |
| LLaMA 4 | Meta | 1M | Natively multimodal, strong generalist |
| Mistral Large 3 | Mistral AI | 256K | Strong European alternative, good multilingual |
| Qwen 3 | Alibaba | 128K | Solid general-purpose, good at maths |
Frequently Asked Questions
Want Help Choosing?
In a 90-minute 1:1 session, I’ll help you figure out which models and tools make sense for your specific work. We’ll set up the ones that fit, build your first prompts together, and you’ll leave knowing exactly what to use and when.
Book a SessionLast updated: February 2026. This page is updated monthly as new models are released.
Written by Riz Pabani, AI Trainer based in London. MIT AI Certified, 20+ years in technology.