No testcontainers, no rollback: how I actually test a SaaS

The shape of the test suite

RoleReady has roughly 490 test files. The split is lopsided on purpose: about 465 unit tests, ~24 integration tests, 4 end-to-end specs. That ratio isn’t an accident or a backlog — it’s the strategy. Unit tests mock everything and run in seconds so I run them constantly. Integration tests hit a real database and run in CI. E2E tests cover the handful of flows where “did the page even render” is the actual risk. And one whole category — the LLM calls — I deliberately don’t test at all. This post is each of those decisions and why I’d defend it in review.

Everything is Vitest with a jsdom environment, plus Playwright for the four e2e specs. Tests live in a top-level tests/ tree organized by layer (tests/unit/**, tests/integration/**, tests/e2e/**), not colocated with source — which keeps the “run only the fast ones” globs trivial.

Unit tests mock at the boundary

A unit test here touches no database and no network. The pattern leans hard on Vitest’s vi.hoisted so the mocks exist before the module under test imports its dependencies:

const { mockGetUserJob, mockCreateContact, mockLinkContactToJob } = vi.hoisted(() => ({
  mockGetUserJob: vi.fn(),
  mockCreateContact: vi.fn(),
  mockLinkContactToJob: vi.fn(),
}));

vi.mock('@/lib/database/queries', () => ({ getUserJob: mockGetUserJob }));
vi.mock('@/lib/database/contacts', () => ({ createContactForUser: mockCreateContact }));
vi.mock('@/lib/inngest/client', () => ({ inngest: { send: vi.fn() } }));

The thing this buys is that I can test real business logic without a runtime. The biggest single file is the assistant command executor’s test — it’s nearly a thousand lines — and it verifies things that genuinely break in production: that a plan of create_job → create_contact → link resolves its temp references ($job_1, $contact_1) to real IDs in order, and that if enqueuing the Inngest job fails after a feature quota was consumed, the quota gets refunded under the same requestKey. None of that needs Postgres. All of it needs to be right.

it('chains create_job → create_contact → link via temp refs', async () => {
  jobInsertOk('job-new');
  mockCreateContact.mockResolvedValue({ id: 'contact-new', name: 'Bob' });

  const result = await executeCommandPlan({ plan, userId: 'user-1', requestKey: 'req-123' });

  expect(result.results.map((r) => r.status)).toEqual(['success', 'success', 'success']);
  expect(mockLinkContactToJob).toHaveBeenCalledWith('contact-new', 'job-new', 'user-1', false);
});

React hooks get the same treatment — mock the authenticatedFetch wrapper, drive the hook through renderHook + act, and assert it walked parse → review → execute → success and called the right endpoints in the right order. No snapshot tests anywhere. Snapshots rot into “press u to accept” noise; I’d rather assert the specific thing I care about.

Integration tests hit a real Postgres — and clean up by hand

The integration tests are where I broke with the orthodoxy twice, and both breaks were deliberate.

No testcontainers. In CI, the database is just a GitHub Actions postgres:18-alpine service. The schema is pushed with drizzle-kit push --force to two databases — the main one and a _test copy — and the test utilities route to the _test one by rewriting the connection string:

const getTestDatabaseUrl = () => {
  const baseUrl = process.env.DATABASE_URL ?? 'postgres://localhost:5432/roleready';
  if (baseUrl.includes('_test')) return baseUrl;
  const url = new URL(baseUrl);
  url.pathname = `/${url.pathname.slice(1)}_test`;
  return url.toString();
};

Testcontainers is great and I didn’t want it. It’s another dependency, another Docker lifecycle, and slower startup, to solve a problem — “give me an ephemeral Postgres” — that a CI service container already solves for free. Locally I point at a _test database I created once.

No transaction-rollback isolation. The fashionable pattern is to wrap each test in a transaction and roll it back, so the database is pristine every time. I don’t. Tests create their rows in beforeEach and delete them in afterEach, in foreign-key dependency order, by hand:

afterEach(async () => {
  await db.delete(events).where(eq(events.jobId, jobId));
  await db.delete(jobOffers).where(eq(jobOffers.jobId, jobId));
  await db.delete(userJobs).where(eq(userJobs.id, jobId));
});

It’s more code per test and it’s fragile if I forget a table. What I get back is that the tests run against the real commit path — real triggers, real constraints, real onConflictDoNothing, real concurrency — instead of inside a transaction that subtly behaves differently from production. That matters here because the bugs I’m hunting are concurrency bugs. One test fires ten concurrent inserts at the same canonical URL and asserts all ten rows survive:

const inserts = Array.from({ length: 10 }, (_, i) =>
  db.insert(userJobs).values({ userId, canonicalUrl, sourceUrl: `${canonicalUrl}?n=${i}`, /* … */ }).returning(),
);
await Promise.all(inserts);
const rows = await db.select().from(userJobs).where(eq(userJobs.canonicalUrl, canonicalUrl));
expect(rows.length).toBe(10);

You cannot meaningfully test that inside a single rolled-back transaction. The whole point is that they’re concurrent and committed.

I don’t test the LLM. On purpose.

Nineteen-plus AI features and zero tests that call a model. Every BAML function is mocked at its async boundary — the test stubs the function that creates the AI job and returns a fake job ID; the real model is never invoked in the suite.

This is the decision people flinch at, so here’s the reasoning. An assertion over a model’s output is either so loose it proves nothing (expect(result).toBeDefined()) or so tight it’s flaky the next time the model is nudged. It costs money and latency on every CI run. And the failure mode I actually care about — “the model returned the wrong shape” — is already caught at the BAML boundary at runtime, because BAML parses the response against a typed class and throws if it doesn’t match. The type system is the test. What I do test is everything around the model: that the route enqueues, that the worker persists, that quota is charged and refunded correctly, that ownership is enforced. The model is a black box with a typed contract, and I test up to the contract and stop.

The CI order is the local order

CI runs three parallel jobs — lint+typecheck, unit tests, integration tests (the last with the Postgres service) — and the gate sequence is the same one I run locally:

baml:generate → lint → typecheck → typecheck:tests → test:unit → (integration) → build

baml:generate first because everything type-checks against generated AI client code. Typecheck before tests because a type error is cheaper to surface than a failing assertion. Build last because it’s the most expensive step and there’s no reason to pay for it if lint already failed.

What isn’t tested, stated honestly

I’d rather name the gaps than imply coverage I don’t have. There are exactly four e2e tests — login, navigation, mobile nav, a responsive sweep — and no full “capture a job → schedule an interview → generate a cover letter” journey. Email (Resend), file uploads (S3), and third-party ingestion are mocked, never hit for real. Retry and timeout paths are thinly covered. Load testing lives in a separate directory of manual scripts, not in CI.

That’s the honest shape of it: deep unit coverage of logic, real-database coverage of the data layer and its concurrency, a thin e2e smoke layer, and a clear-eyed decision to not test the parts where a test would cost more than it proves. The goal was never a coverage number. It was a suite fast enough that I actually run it, and trustworthy enough that green means something.