Skip to main content

I stopped writing prompts as strings

5 min read
ai baml llm typescript

The problem with JSON.parse(completion)

The first version of every AI feature I’ve ever built looks the same: a template literal full of instructions, a call to some SDK, and a hopeful JSON.parse of whatever came back. It works in the demo. Then a model returns a number where you expected a string, or wraps its JSON in a markdown fence, or drops a field, and the failure surfaces four function calls downstream as a blank PDF, with no stack trace pointing anywhere near the prompt.

RoleReady has 19 AI features in production — resume tailoring, cover letters, interview question generation, offer comparison, company research, job extraction from a pasted URL. Not one of them parses a string. They’re BAML functions, and BAML treats “the model returns this shape” as a type the compiler enforces.

What a function looks like

BAML is a small language for declaring AI functions. You write the signature, the return type as a class, and the prompt as a template. It generates typed TypeScript you call like any other async function.

function ExtractJob(rawContent: string, source: SourceType, jsonLdHint: string?) -> JobExtractionResult {
  client RoleReadyFast
  prompt #"
    You extract structured data from job posting content.
    Source type: {{ source }}
    {{ rawContent }}
    {{ ctx.output_format }}
  "#
}

class JobExtractionResult {
  title string?
  company string?
  responsibilities Responsibility[]?
  qualifications Qualifications?
  skills SkillCategory[]?
  extractionConfidence int?
}

That {{ ctx.output_format }} is the load-bearing piece. BAML injects model-appropriate formatting instructions and then parses the response against JobExtractionResult. If the model omits a required field or returns the wrong type, the call throws at the boundary — right where the prompt is — instead of leaking a malformed object into the rest of the app. On the TypeScript side it’s just:

import { b } from '@/lib/baml_client';
const job = await b.ExtractJob(html, 'Url', jsonLd); // job: JobExtractionResult, fully typed

Autocomplete on job.responsibilities[0].priority. The enum values are real. The types and the prompt can’t drift, because the type is the contract the prompt is graded against.

Three clients, three price points, real fallbacks

The decision I’d repeat on any future product is tiering the models. I defined three clients and pointed each at a different OpenRouter model through environment variables, so I can swap models without touching code:

retry_policy RoleReadyRetry {
  max_retries 2
  strategy { type exponential_backoff, delay_ms 300, multiplier 2.0, max_delay_ms 10000 }
}

client<llm> RoleReadyPremium {
  provider openai-generic
  retry_policy RoleReadyRetry
  options { base_url "https://openrouter.ai/api/v1", model env.AI_MODEL_PREMIUM }
}
  • Fast runs the high-volume, latency-sensitive work: extracting a job posting, categorizing a job-bank row.
  • Standard runs balanced work: profile extraction, company intelligence, match explanations.
  • Premium runs the reasoning-heavy features where quality is the product: resume tailoring, cover letters, interview answer guides, offer comparison.

Each tier’s primary model is wrapped in a fallback to a different backup model with no retries. This is the detail that matters in production: when an OpenRouter upstream has a bad afternoon, the request doesn’t fail — it degrades to another model. A single provider going down becomes “slightly different output this hour,” not an incident. Routing everything through OpenRouter means one API key and one place to change a model when a better one ships.

Types flow all the way through

Shared enums live in one file and every function reuses them — SkillLevel, QualificationImportance, InterviewQuestionCategory, a 12-variant AnswerComponentType for the structured interview answer guides. Because they’re BAML classes, they compile to TypeScript types and a downstream Zod schema can validate contractual transforms (normalizing a salary field, mapping a legacy enum) without re-declaring the field logic. BAML is the source of truth for shape; Zod is the source of truth for the handful of post-hoc rules. They don’t overlap, so they can’t disagree.

The pipeline composes the way you’d hope. Paste a job URL and ExtractJob (Fast) parses it; GenerateJobIntelligence (Standard) enriches it with success milestones and resume keywords; click “tailor” and TailorResume (Premium) writes against that structured intel; start interview prep and GenerateInterviewPrepPlan (Premium) seeds categories of talking points. Every arrow between those steps is a typed object, not a parsed string.

The honest tradeoffs

It’s not free. BAML is a build step — bun run baml:generate regenerates the client, and the repo’s rule is that any PR touching baml_src/** must regenerate and commit the output, enforced in CI. There’s a real language to learn, and the prompt lives in .baml files instead of next to the code that calls it. For a one-off prompt, that’s overhead you don’t want.

What you get for it: smoke tests that live with the function (test extract_job_smoke { args { rawContent "Vercel is hiring..." } }), so you can run a prompt against a fixture without booting the app; a single grep-able place where every model and prompt lives; and the thing I value most — when a feature breaks, the error is a parse failure on a named function, not a mystery three screens away.

I’m not going back to template literals. Nineteen features in, the typed-function framing is the only reason I can still reason about the AI layer at all.