Table of Contents
Over the past year I've been integrating AI into client projects — customer support platforms, internal tooling, content pipelines. Claude Opus 4.8 is the model I reach for when the task genuinely needs deep reasoning. Here's an honest account of what that looks like in practice, from a frontend developer who bills by the result, not the token.
Why reach for Opus over Sonnet or Haiku
The Claude 4 family gives you three tiers. Haiku is fast and cheap — great for classification, quick rewrites, autocomplete-style tasks. Sonnet sits in the middle and covers most things well. Opus is slower and more expensive, but it's the only one that handles genuinely complex, multi-step reasoning without slipping.
The clearest signal to reach for Opus:
- The task requires holding many constraints in mind at once (contract analysis, legal summaries, architectural decisions)
- You're doing agentic work — tool use, multi-step chains where one wrong step breaks everything downstream
- Accuracy matters more than latency (batch jobs, async processing, background tasks)
- You've already tried Sonnet and it's failing on edge cases you can't tune around
Most of the time, Sonnet is the right call. I'm not exaggerating when I say 80% of what I ship runs on Sonnet. But that other 20% — the tasks where quality is load-bearing — that's where Opus earns its cost.
Setting up the Anthropic SDK
Install the SDK and you're done with the boilerplate:
npm install @anthropic-ai/sdk
A basic call looks like this:
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const message = await client.messages.create({
model: 'claude-opus-4-8',
max_tokens: 1024,
messages: [
{
role: 'user',
content: 'Summarise this support ticket and suggest a resolution category.',
},
],
});
console.log(message.content[0].text);
Keep ANTHROPIC_API_KEY server-side. Never expose it in browser code. Route all Claude calls through a backend endpoint or a Next.js API route.
Real use cases I've shipped
Support ticket triage
The SaaS platform I work on handles customer support for a large user base. I built a triage layer that feeds incoming tickets to Claude, extracts intent, urgency, and a suggested category, then writes structured JSON back to the database. Agents still handle replies — Claude just handles the sorting and tagging work that was burning their time.
const response = await client.messages.create({
model: 'claude-opus-4-8',
max_tokens: 512,
system: `You are a support triage assistant. Extract the following from the ticket:
- intent (one sentence)
- urgency: low | medium | high | critical
- category: billing | technical | feature-request | account | other
- suggested_action (one sentence)
Respond with valid JSON only. No commentary.`,
messages: [{ role: 'user', content: ticketBody }],
});
const triage = JSON.parse(response.content[0].text);
Forcing JSON output is reliable when you're explicit in the system prompt. With Opus, I haven't needed to add schema validation retries — it follows the format consistently. With smaller models I sometimes do.
Content pipeline for a media client
One client publishes high volumes of articles. I built a pipeline that takes a raw draft, runs it through Claude for tone consistency, SEO keyword integration, and readability rewriting — then returns a diff for the editor to approve. Opus handles the nuance of preserving the author's voice while improving the structure. Sonnet was flattening things too much.
Internal knowledge base Q&A
Another client had documentation spread across Notion, Confluence, and a SharePoint graveyard. I built a simple RAG setup: documents are chunked and embedded, relevant chunks are retrieved per query, and Claude synthesises an answer with citations. Opus is better here because it reasons across multiple conflicting chunks rather than just returning the closest one.
Streaming responses in React
For anything user-facing, streaming is the difference between feeling slow and feeling alive. The Anthropic SDK supports streaming natively:
// api/chat/route.ts (Next.js App Router)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
export async function POST(req: Request) {
const { message } = await req.json();
const stream = client.messages.stream({
model: 'claude-opus-4-8',
max_tokens: 1024,
messages: [{ role: 'user', content: message }],
});
return new Response(stream.toReadableStream());
}
// React component
const [output, setOutput] = useState('');
async function sendMessage(text: string) {
setOutput('');
const res = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ message: text }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE events from the stream
const lines = chunk.split('\n').filter(Boolean);
for (const line of lines) {
if (line.startsWith('data: ')) {
try {
const data = JSON.parse(line.slice(6));
if (data.type === 'content_block_delta') {
setOutput(prev => prev + data.delta.text);
}
} catch {}
}
}
}
}
The perceived speed difference is significant. Even if total time is the same, watching text appear token by token feels responsive in a way that a three-second spinner doesn't.
Prompt caching — the thing everyone skips
This is the most underused feature in the Anthropic API. If your system prompt or context documents are long and repeated across requests, prompt caching can cut your costs dramatically.
Mark content with cache_control and Anthropic will cache it at that breakpoint for 5 minutes (extendable to 1 hour on longer TTLs). Cache hits are charged at 10% of the normal input token price.
const response = await client.messages.create({
model: 'claude-opus-4-8',
max_tokens: 1024,
system: [
{
type: 'text',
text: longSystemPrompt, // stays the same across requests
cache_control: { type: 'ephemeral' },
},
],
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: largeContextDocument, // also repeated
cache_control: { type: 'ephemeral' },
},
{ type: 'text', text: userQuestion },
],
},
],
});
In my knowledge base Q&A project, the system prompt plus retrieved documents averaged around 8,000 tokens. With caching on, repeated queries in the same session dropped to a fraction of the cost. On Opus, that difference matters.
Cost in practice
Opus is the most expensive model in the Claude 4 family. That's not a reason to avoid it — it's a reason to be deliberate about where you use it.
Things that keep costs in check:
- Use prompt caching for any repeated system context or documents. First request builds the cache; subsequent ones are cheap.
- Set a realistic
max_tokens. If you're extracting structured data, you don't need 4096 tokens. Cap it tightly. - Batch where latency doesn't matter. Background processing jobs can queue requests rather than firing them simultaneously.
- Route to Sonnet first. For tasks with variable complexity, you can try Sonnet, check output quality programmatically, and fall back to Opus only when needed.
In my triage pipeline, I log usage.input_tokens and usage.output_tokens per request and track monthly spend against the business value. The ROI conversation is easier when you have the numbers.
Mistakes I made early on
Treating the API like a search engine
Early on I was writing prompts like queries — short, keyword-heavy, no context. LLMs aren't search engines. They respond to the same level of clarity and context you'd give a capable human colleague. The more clearly you define the task, the constraints, and the output format, the better the result.
Not validating structured output
When you ask for JSON, validate it. Opus is reliable, but not infallible — especially when the user input contains unusual characters or your instructions have edge cases. Wrap your JSON.parse in a try/catch and have a fallback path.
Running everything synchronously
For anything non-interactive, make API calls asynchronous and off the main request path. A user submitting a form shouldn't wait for Claude to finish processing — queue the job, return immediately, and update the UI via WebSocket or polling when the result is ready.
Ignoring the system prompt
The system prompt is where you define the model's role, output format, constraints, and tone. I've seen integrations that put everything in the user message and then wonder why results are inconsistent. Separate concerns: system prompt for persona and rules, user message for the actual task.
Integrating AI into your product or workflow?
Let's talk