The stack vs. the reality: shipping a Next.js app to Cloud Run

The list lies a little

If you ask what RoleReady runs on, the honest one-liner is: Next.js 16 standalone, Bun, Postgres 18, Doppler, Cloud Run, Terraform. True, and almost useless — because the decisions that actually cost me time weren’t framework choices, they were the boring seams between them. The Dockerfile, where secrets live, what runs serverless versus what runs in a container. This post is about those seams.

Why Cloud Run and not just Vercel

The marketing site is on Vercel and should be — it’s a static Astro build, Vercel is the path of least resistance, done. The app is a different question. It’s a heavy Next.js 16 server with background workers, a long-lived connection pool to Cloud SQL, and AI work that wants a real runtime, not a function that’s reborn on every request.

Cloud Run gave me a container I control, a healthcheck I define, scale-to-zero when nobody’s around, and — the deciding factor — a private path to the database. Cloud SQL sits on a private IP with no public address; the app reaches it over Direct VPC egress through the Cloud SQL Auth Proxy, IAM-authenticated. The database is not on the internet. Getting that topology on a pure serverless platform is possible but fights you; on Cloud Run it’s the default shape. Postgres runs on a modest db-custom-1-3840 instance (1 vCPU, 3.75 GB) that autoresizes its disk, and the whole thing — Cloud Run service, Cloud SQL, VPC, KMS, Artifact Registry — is Terraformed so I can read the infrastructure instead of clicking through a console trying to remember what I set.

The Dockerfile is multi-stage on purpose

A naive Dockerfile installs everything and ships it, and you get a 1.5GB image where the production container is carrying a compiler and your entire node_modules to run code that’s already built. The fix is multi-stage: each stage does one job, and the final image copies only what it needs from the earlier ones.

The build leans on Next’s standalone output, which traces exactly the files the server needs and emits a self-contained .next/standalone. The runner stage copies that and nothing else:

FROM oven/bun:1 AS base
# ... deps stage: bun install --frozen-lockfile
# ... builder stage: bun run build  (regenerates BAML, emits .next/standalone)

FROM base AS runner
COPY --from=doppler  /usr/local/bin/doppler        /usr/local/bin/doppler
COPY --from=builder  /app/.next/standalone          ./
COPY --from=builder  /app/.next/static              ./.next/static
COPY --from=builder  /app/public                    ./public
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD curl -fsS http://127.0.0.1:3000/api/health || exit 1
CMD ["/app/entrypoint.sh"]

The build base carries build-essential and python3 because sharp compiles native code; the runner carries none of it. The result is a lean image that holds the built server, the static assets, and the Doppler CLI — and that’s it. The healthcheck means Cloud Run knows the difference between “container started” and “app actually answers,” which is the difference between a deploy that works and one that silently serves 502s.

Zero .env files, on purpose

There is not one .env file in the repository. Not gitignored — absent. Every command that needs a secret is wrapped:

doppler run -c dev -- bun run dev

The reasoning is that the most common credential leak for a solo dev isn’t a sophisticated attack; it’s a .env that slips past .gitignore during a fast commit, or gets pasted into an issue. If the file never exists, that whole category is gone. The rule is even stricter than “use Doppler” — it’s “always name the config explicitly” (-c dev), so you can never accidentally run a dev command against prod secrets because of an ambient default.

In production the same principle holds, mapped onto Cloud Run. Secrets come from GCP Secret Manager, mounted into the container as a JSON file, and the entrypoint reads them into the environment before the server boots:

# scripts/entrypoint.sh
SECRETS_FILE="${SECRETS_FILE:-/etc/doppler/secrets.json}"
if [ -f "$SECRETS_FILE" ]; then
  eval "$(bun -e "const s = JSON.parse(require('fs').readFileSync('$SECRETS_FILE','utf8'));
    for (const [k,v] of Object.entries(s)) console.log('export '+k+'='+JSON.stringify(String(v)));")"
fi
exec bun run server.js

Dev and prod differ in where the secrets come from, but the app’s contract is identical: read them from the environment, never from a file on disk it committed.

The build order is the CI order

The thing that keeps the pipeline honest is that the local verification sequence is byte-for-byte the CI sequence:

baml:generate → lint → typecheck → typecheck:tests → test:unit → build

baml:generate first, because the AI client is generated code and everything downstream type-checks against it. Typecheck runs on a native TypeScript compiler (no tsc), which is fast enough that “just run typecheck” is a reflex, not a coffee break. Build last, because it’s the most expensive step and there’s no reason to pay for it if lint already failed. An agent or a contributor running this in order fails at the cheapest possible point. CI runs the same list in the same order, plus integration tests against a real Postgres, so “it passed locally” actually means something.

What I’d tell past-me

The framework you pick matters for about a week. The seams matter for the life of the project: where do secrets live, what does the production image actually contain, what’s the boundary between serverless and a real runtime, and is your local verification the same as CI’s. I spent more total time getting the Dockerfile lean and the secrets path airtight than I did choosing Next.js — and that’s the correct ratio. Nobody pages you at 2am because you picked the wrong React framework. They page you because a secret leaked or the image won’t boot.