System design

Frontend Architecture Guide

A senior-level tour of frontend and full-stack system design, mapped onto Next.js 16.2 and React 19.2. This is the “how do you think about building a product frontend at scale” material — rendering strategy, caching, data flow, and org structure, with the trade-offs named out loud.

01 · Thinking in systems for a frontend interview

Frontend system design is judged the same way backend design is: decision-making before code, narrated. A structured flow keeps you from free-associating about component names:

Step	What you do	Time
Clarify	Who's the user, what devices/networks, how many pages, personalized or not, SEO requirement, expected traffic/read-write ratio. Most candidates skip straight to component trees.	~15%
Rough architecture	Page inventory → rendering mode per page → data sources → where state lives. A sitemap, not a component diagram.	~20%
Deep dive	Zoom into one piece — the caching/invalidation story, or the state architecture, or the rollout plan.	~35%
Trade-offs	Name alternatives and why you didn't pick them. The senior signal.	~20%
Summarize	Recap, flag risks (stale cache, bundle bloat, TTFB), take follow-ups.	~10%

Senior tell Say the trade-off out loud: “I'll cache this page with a tagged revalidation — I'm trading a few seconds of staleness after a write for a near-zero TTFB on every read.”

02 · Rendering strategy as the first architectural decision

Before any component gets designed, decide how each page in the inventory gets its HTML — this single choice drives CDN cacheability, TTFB, infra cost, and personalization for everything downstream.

Page type	Strategy	Why
Marketing / landing	Static (build-time)	Same for everyone, cache at the edge indefinitely, near-zero TTFB
Product / category pages	ISR or Cache Components (`"use cache"`)	Mostly-static content that changes on a schedule or on a write — tag-based revalidation instead of a blanket rebuild
Dashboards	Dynamic + Suspense-streamed	Per-user data, but a cached shell (nav/layout) can still ship instantly while widgets stream in
Checkout	Fully dynamic	Correctness (inventory, price, tax) trumps latency; nothing here may be cached

Getting this wrong in either direction is the classic mistake: over-caching a checkout page serves stale prices; making a marketing page fully dynamic burns TTFB and origin compute for content that never changes.

03 · The caching stack — CDN, framework, browser

A Next.js response passes through three independent caches, each with its own invalidation mechanism — and a senior answer treats them as one coordinated system, not three separate concerns.

Layer	What it caches	Invalidation
CDN / edge	Full HTML responses for static/ISR routes	Purge on deploy, or tag-based purge from `revalidateTag`
Next.js server	Rendered output + fetches, per Cache Components (`"use cache"`, `cacheLife`, `cacheTag`)	`revalidateTag(tag, profile)`, `updateTag()`, time-based `cacheLife` profiles
Browser	Static assets, and the Router Cache for client navigations	`Cache-Control` headers, immutable hashed filenames

Next 16's Cache Components model (opt in via cacheComponents: true) flips the default: nothing is cached unless a segment or function is explicitly wrapped in "use cache", with a cacheLife profile controlling staleness and cacheTag giving it an invalidation key. This replaces the older implicit fetch/route-segment caching, where a bare fetch could be silently cached and surprise you. Coordinating invalidation means one webhook or Server Action calling revalidateTag has to account for the CDN purge that follows it — a stale CDN entry outlives a fresh server cache if you forget it.

04 · Data layer & the BFF pattern

Next.js Server Components and Route Handlers naturally sit as a backend-for-frontend: they aggregate, reshape, and proxy calls to real backend services (REST, GraphQL, gRPC-via-gateway) so the client never talks to five origins directly.

Put an anti-corruption layer at the BFF boundary — map foreign service DTOs to your app's own types in one place, so an upstream schema change doesn't ripple through every component that reads that data. Fetch in parallel, not in sequence: kick off independent requests before awaiting any of them (or use Promise.all), because a Server Component that awaits fetch A then fetch B serially reproduces the classic request waterfall the client used to have, just moved to the server.

Client considerations A GraphQL client used from Server Components doesn't need client-side cache normalization (there's no client) — keep it as a thin request layer and let Cache Components own caching. Reserve a full client-side GraphQL/TanStack Query cache for data that's fetched from Client Components after the initial load.

05 · State management architecture

The first design decision isn't which library — it's which category a piece of state belongs to, because each category has a natural home in Next.js:

State kind	Lives in	Example
Server state	Fetched in a Server Component, cached via Cache Components	Product catalog, user profile
URL state	`searchParams` / route segments	Filters, pagination, selected tab — shareable, back-button-friendly
Client UI state	Local `useState`, or a client store for cross-tree state	Modal open/closed, form draft, optimistic UI

Reach for a client state library (Zustand, Jotai, TanStack Query) only when state is genuinely client-owned and cross-cutting — a shopping cart open across routes, or client-side caching of data fetched after hydration. Server Components + Server Actions handle far more than teams expect without one: form submission, revalidation, and redirects can all happen server-side with no client store at all. When state does need to reach deep children, prefer the interleaving/children pattern (pass pre-rendered Server Component output down as children/props) over reaching for Context — Context re-renders every consumer on every change and forces the provider's subtree to be a Client Component.

06 · Monorepo & scaling the frontend org

Structure by feature, not by file type — a features/checkout/ folder with its own components, hooks, and actions colocated beats a top-level components/, hooks/, actions/ split that forces you to open five folders to change one flow.

A shared design-system package (tokens, primitives, a handful of composed components) is the highest-leverage investment for a multi-team org — it's what keeps ten teams' UIs looking like one product. Draw ownership boundaries along features/domains, enforced by folder structure and code-owners, not by which framework file a piece of code happens to live in.

Escape hatch, not a default Micro-frontends (Module Federation, or multiple Next.js apps stitched via multi-zone routing) buy independent deploys and independent tech choices at the cost of duplicated framework runtime, cross-app navigation complexity, and a harder design-consistency problem. Earn them with a concrete need (a team that must ship on its own cadence) — don't reach for them because the org chart has multiple teams.

07 · Edge, CDN & global delivery

Static assets (JS/CSS bundles, images, fonts) belong on a CDN with immutable, hashed filenames — cache forever, invalidate by changing the URL, not by purging. Images get edge-optimized (resize/format/quality negotiated per-request, typically via next/image) so a phone on a slow network doesn't download a desktop-sized asset.

Next 16 changed the edge story: proxy.ts (Node.js runtime) replaces the deprecated, edge-only middleware.ts. That's a real trade-off, not a rename — a Node.js runtime has the full Node API surface (useful for anything beyond header/cookie inspection and redirects) but loses the near-zero cold-start, run-at-every-PoP characteristics of the old edge runtime. Design proxy logic assuming it runs close to, but not literally at, every edge location.

For streaming SSR across regions, the shell can be served from the CDN edge while the dynamic/streamed parts still round-trip to an origin region — so multi-region deployment of the origin (or picking a region close to the majority of users) matters more than it did for pure static sites. The metric that actually tells you if this architecture is working is cache-hit ratio at the CDN: a low hit ratio on pages you believed were static means your rendering-strategy decision (Section 02) or your cache tags (Section 03) are wrong somewhere.

08 · Observability & shipping safely

Ship frontend changes with the same rigor as backend ones. Core Web Vitals (LCP, INP, CLS) need real-user monitoring (RUM), not just lab/Lighthouse scores — a synthetic run on a fast laptop hides what a mid-tier phone on 4G actually experiences. Pair RUM with error tracking on both sides: client-side (unhandled exceptions, failed hydration) and server-side (Server Component/Route Handler/Server Action errors), since a caching bug can silently serve broken HTML that no client-side error ever fires for.

Feature flags are the safety valve for a risky architectural change — gradually rolling out a Cache Components migration route-by-route behind a flag lets you compare cache-hit ratio and error rate for the flagged cohort before flipping it globally (see Deep Dive 4).

Deployment model	Fits	Cost
`output: "export"` (fully static)	No Server Components/Actions, no ISR/Cache Components — pure static hosting (S3/CDN)	Cheapest, most limited
Node/Edge server platform	Full feature set: streaming, Server Actions, Cache Components, `proxy.ts`	Higher ops surface, but the only option once you need any of the above

Note that Next 16 removed the build-time “First Load JS” metric from the CLI output — bundle-size regressions now have to be caught via Lighthouse CI or a RUM/analytics dashboard instead of a build-log number, which is one more reason RUM isn't optional.

Deep dives

Five classic frontend/full-stack system-design prompts, each as Concept → Example → Gotcha → Senior answer. These are the ones interviewers reach for when hiring for a Next.js role — rehearse the trade-offs out loud.

System design

Design the rendering strategy for a large e-commerce site

Concept Not every page on the same site gets the same rendering mode. Product/category pages are read-heavy and mostly-static; search/personalized pages are per-user; checkout is correctness-critical. Pick per page type, not per site.

Example Product pages: ISR or Cache Components with cacheTag(`product:${id}`), revalidated on price/inventory writes. Search/personalized results: dynamic, Suspense-streamed so the shell (nav, filters skeleton) paints immediately while results stream in. Checkout: fully dynamic, no "use cache" anywhere in the tree.

Gotcha An over-eager cache shows stale inventory or price — a customer adds an out-of-stock item to cart, or checks out at yesterday's price. This is the single most common e-commerce architecture bug, and it comes from tagging too coarsely (one tag for the whole catalog) or setting cacheLife too long on inventory-sensitive fragments.

Senior answer Tag revalidation per product, not per catalog, so a single price change only busts one entry. When an admin edits a price, call updateTag in that same Server Action so the admin's own next read is fresh (read-your-writes) without waiting for background revalidation. Give inventory-sensitive fragments (stock count, “3 left”) a short cacheLife profile independent of the rest of the product page, which can stay cached far longer. Trade-off named out loud: finer tag granularity costs more cache-invalidation plumbing in exchange for fewer stale reads.

System design

Design a real-time-feeling dashboard with Next.js

Concept A dashboard is really two different pages sharing a layout: a mostly-static shell (nav, filters, layout chrome) and a set of volatile widgets (live counters, charts) that need to feel current. Treat them differently instead of making the whole route dynamic.

Example Cache the shell with Cache Components. Wrap each live widget in its own <Suspense> boundary so it streams independently; poll on an interval for widgets where near-real-time is fine, or a WebSocket/SSE connection for genuinely live data (trading price, live viewer count). For a one-off refresh after a user action (“Refresh” button, or after submitting a filter), call the Server Actions refresh() API to re-run uncached server work without a full page reload.

Gotcha Two failure modes in opposite directions: over-fetching (re-running every widget's query on every render/poll tick, even ones the user isn't looking at) burns server load; and making the whole page dynamic to get live data anywhere on it kills TTFB for the 90% of the dashboard that's actually static (nav, layout, historical charts).

Senior answer Cache the shell, stream or poll only the volatile widgets, and scope each widget's own cache/staleness independently — a live counter and a 24-hour trend chart on the same screen do not need the same freshness. Use refresh() for user-triggered updates instead of location.reload(), which throws away the cached shell you just paid to keep fast. Trade-off: per-widget streaming is more Suspense boundaries to manage than one dynamic page, in exchange for a shell that loads instantly regardless of how slow the live data is.

System design

Design the caching/invalidation strategy for a multi-author CMS-backed blog

Concept Content pages are cache-aside: render once, cache with a tag identifying that content, and let a webhook from the CMS drive invalidation instead of polling or a fixed TTL that's either too short (wasted cache) or too long (stale after publish).

Example Each post page uses cacheTag(`post:${slug}`) and cacheTag(`author:${authorId}`). The CMS fires a webhook on publish/edit → a Route Handler validates it and calls revalidateTag('post:my-slug', 'max'), purging just that entry (and the CDN in front of it) instead of rebuilding the whole site.

Gotcha Two authors publishing near-simultaneously can race: author A's webhook revalidates, then author B's slower webhook lands and momentarily serves a half-updated cache during the overlap window. Separately, tagging too finely (a tag per paragraph or per related-post relationship) causes tag cardinality explosion — thousands of tags to track for marginal invalidation precision.

Senior answer Pick tag granularity deliberately: per-post is almost always the right unit; per-category tags are useful for list pages but should stay coarse. Use the 'max' cacheLife profile with background stale-while-revalidate for the bulk of read traffic — most readers can tolerate a few seconds of staleness after a publish. Reserve updateTag for the publishing author's own session, so they see their edit immediately (read-your-writes) without forcing that same aggressive freshness on every other reader. Trade-off: coarser tags mean occasionally over-invalidating (purging more than strictly changed) in exchange for a invalidation graph a human can actually reason about.

System design

Design safe rollout of a Cache Components migration on an existing app

Concept Turning on cacheComponents: true flips the caching default for the whole app at once: fetches and route segments that were implicitly cached under the old model become uncached by default unless explicitly wrapped in "use cache". That's not a config tweak — it's a behavioral migration that needs an incremental rollout path, not a single flag flip in production.

Example Adopt route-by-route: either run the flag behind a feature flag that only affects a subset of routes (e.g. via multiple root layouts, one opted-in and one not, selected by route group), or migrate one route segment at a time in separate PRs, auditing every runtime-API usage (cookies(), headers(), searchParams) in that segment for a needed Suspense boundary before flipping it.

Gotcha Flipping the flag globally without a per-route rollout silently changes cache behavior for the entire app at once — pages that were fast because of implicit caching can suddenly hit the origin on every request (worse: some can go fully dynamic without a Suspense boundary in place, breaking the build or the page).

Senior answer Canary a single low-risk route first (something read-heavy but not revenue-critical), and use build output plus tooling like the next-devtools-mcp to verify what actually got statically shelled vs. streamed vs. fully dynamic before trusting your mental model of the change. Roll out gradually route group by route group, watching cache-hit ratio and TTFB per route as the migration proceeds, with the pre-migration behavior as an instant rollback (revert the flag for that route group). Trade-off: a route-by-route rollout takes longer than a single flag flip, in exchange for never finding out about a regression from a production incident.

Frontend design

Design the component/state architecture for a large multi-team Next.js app

Concept At multi-team scale, the architecture question stops being “how do I structure one feature” and becomes “how do teams work on this app without stepping on each other” — folder structure, shared UI, and where server-side logic lives all have to encode team boundaries.

Example Feature folders (features/checkout/, features/search/) each own their components, Server Actions, and tests. Server Actions live colocated with the feature that uses them by default; only truly cross-cutting actions (auth, analytics) go in a shared actions/ module. A shared UI package holds only genuinely reusable primitives (Button, Input, layout tokens) — feature-specific composites stay in the feature folder even if they look reusable at first.

Gotcha A shared Client Component that wraps too much of the tree (a top-level <Providers> client wrapper holding onto everything from theme to analytics to a state library) drags every page's bundle up with it and forces server-rendered subtrees underneath it to hydrate as client code even when they didn't need to. This is the multi-team version of a God Context — everyone imports it, no one can safely change what's inside it.

Senior answer Push "use client" to the leaves — the smallest component that actually needs interactivity, not the top of the tree — so Server Components stay the default and client bundle growth is attributable to a specific feature, not a shared root. Since Next 16 removed the build's “First Load JS” metric, measure bundle impact with route-level awareness via Lighthouse CI or Vercel Analytics instead, and gate PRs on a regression there rather than eyeballing a build log. Give each team folder-level ownership (enforced via code-owners) so the boundary is structural, not just a convention people forget. Trade-off: colocating Server Actions per feature means some duplication across features versus one shared module, in exchange for teams being able to change their own actions without a cross-team review.