May 29, 2026 9 min read by Mykola Samila

Over the past year I've been integrating AI into client projects — customer support platforms, internal tooling, content pipelines. Claude Opus 4.8 is the model I reach for when the task genuinely needs deep reasoning. Here's an honest account of what that looks like in practice, from a frontend developer who bills by the result, not the token.

Why reach for Opus over Sonnet or Haiku

The Claude 4 family gives you three tiers. Haiku is fast and cheap — great for classification, quick rewrites, autocomplete-style tasks. Sonnet sits in the middle and covers most things well. Opus is slower and more expensive, but it's the only one that handles genuinely complex, multi-step reasoning without slipping.

The clearest signal to reach for Opus:

Most of the time, Sonnet is the right call. I'm not exaggerating when I say 80% of what I ship runs on Sonnet. But that other 20% — the tasks where quality is load-bearing — that's where Opus earns its cost.

Setting up the Anthropic SDK

Install the SDK and you're done with the boilerplate:

npm install @anthropic-ai/sdk

A basic call looks like this:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const message = await client.messages.create({
  model: 'claude-opus-4-8',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: 'Summarise this support ticket and suggest a resolution category.',
    },
  ],
});

console.log(message.content[0].text);

Keep ANTHROPIC_API_KEY server-side. Never expose it in browser code. Route all Claude calls through a backend endpoint or a Next.js API route.

Real use cases I've shipped

Support ticket triage

The SaaS platform I work on handles customer support for a large user base. I built a triage layer that feeds incoming tickets to Claude, extracts intent, urgency, and a suggested category, then writes structured JSON back to the database. Agents still handle replies — Claude just handles the sorting and tagging work that was burning their time.

const response = await client.messages.create({
  model: 'claude-opus-4-8',
  max_tokens: 512,
  system: `You are a support triage assistant. Extract the following from the ticket:
- intent (one sentence)
- urgency: low | medium | high | critical
- category: billing | technical | feature-request | account | other
- suggested_action (one sentence)

Respond with valid JSON only. No commentary.`,
  messages: [{ role: 'user', content: ticketBody }],
});

const triage = JSON.parse(response.content[0].text);

Forcing JSON output is reliable when you're explicit in the system prompt. With Opus, I haven't needed to add schema validation retries — it follows the format consistently. With smaller models I sometimes do.

Content pipeline for a media client

One client publishes high volumes of articles. I built a pipeline that takes a raw draft, runs it through Claude for tone consistency, SEO keyword integration, and readability rewriting — then returns a diff for the editor to approve. Opus handles the nuance of preserving the author's voice while improving the structure. Sonnet was flattening things too much.

Internal knowledge base Q&A

Another client had documentation spread across Notion, Confluence, and a SharePoint graveyard. I built a simple RAG setup: documents are chunked and embedded, relevant chunks are retrieved per query, and Claude synthesises an answer with citations. Opus is better here because it reasons across multiple conflicting chunks rather than just returning the closest one.

Streaming responses in React

For anything user-facing, streaming is the difference between feeling slow and feeling alive. The Anthropic SDK supports streaming natively:

// api/chat/route.ts (Next.js App Router)
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

export async function POST(req: Request) {
  const { message } = await req.json();

  const stream = client.messages.stream({
    model: 'claude-opus-4-8',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }],
  });

  return new Response(stream.toReadableStream());
}
// React component
const [output, setOutput] = useState('');

async function sendMessage(text: string) {
  setOutput('');
  const res = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({ message: text }),
  });

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    // Parse SSE events from the stream
    const lines = chunk.split('\n').filter(Boolean);
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        try {
          const data = JSON.parse(line.slice(6));
          if (data.type === 'content_block_delta') {
            setOutput(prev => prev + data.delta.text);
          }
        } catch {}
      }
    }
  }
}

The perceived speed difference is significant. Even if total time is the same, watching text appear token by token feels responsive in a way that a three-second spinner doesn't.

Prompt caching — the thing everyone skips

This is the most underused feature in the Anthropic API. If your system prompt or context documents are long and repeated across requests, prompt caching can cut your costs dramatically.

Mark content with cache_control and Anthropic will cache it at that breakpoint for 5 minutes (extendable to 1 hour on longer TTLs). Cache hits are charged at 10% of the normal input token price.

const response = await client.messages.create({
  model: 'claude-opus-4-8',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: longSystemPrompt, // stays the same across requests
      cache_control: { type: 'ephemeral' },
    },
  ],
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: largeContextDocument, // also repeated
          cache_control: { type: 'ephemeral' },
        },
        { type: 'text', text: userQuestion },
      ],
    },
  ],
});

In my knowledge base Q&A project, the system prompt plus retrieved documents averaged around 8,000 tokens. With caching on, repeated queries in the same session dropped to a fraction of the cost. On Opus, that difference matters.

Cost in practice

Opus is the most expensive model in the Claude 4 family. That's not a reason to avoid it — it's a reason to be deliberate about where you use it.

Things that keep costs in check:

In my triage pipeline, I log usage.input_tokens and usage.output_tokens per request and track monthly spend against the business value. The ROI conversation is easier when you have the numbers.

Mistakes I made early on

Treating the API like a search engine

Early on I was writing prompts like queries — short, keyword-heavy, no context. LLMs aren't search engines. They respond to the same level of clarity and context you'd give a capable human colleague. The more clearly you define the task, the constraints, and the output format, the better the result.

Not validating structured output

When you ask for JSON, validate it. Opus is reliable, but not infallible — especially when the user input contains unusual characters or your instructions have edge cases. Wrap your JSON.parse in a try/catch and have a fallback path.

Running everything synchronously

For anything non-interactive, make API calls asynchronous and off the main request path. A user submitting a form shouldn't wait for Claude to finish processing — queue the job, return immediately, and update the UI via WebSocket or polling when the result is ready.

Ignoring the system prompt

The system prompt is where you define the model's role, output format, constraints, and tone. I've seen integrations that put everything in the user message and then wonder why results are inconsistent. Separate concerns: system prompt for persona and rules, user message for the actual task.

Integrating AI into your product or workflow?

Let's talk

← All articles  ·  Portfolio