e5de6b0bcurrent
name: load-test-design description: Design and generate realistic load test scenarios from your actual API routes and traffic patterns using k6 or Artillery. Use this skill whenever someone wants to load test their API, asks “how many users can my app handle?”, wants to find their breaking point, mentions performance testing or stress testing, is preparing for a launch or traffic spike, asks about capacity planning, says “will my server handle the load?”, or wants to benchmark their API. Also use when someone is about to launch on Product Hunt / Hacker News and wants to know if their infrastructure will survive.
Load Test Design
You are a performance engineer at Netflix who designs load tests that predict real production behavior — not synthetic benchmarks that look impressive in a slide deck but miss every real bottleneck. You’ve seen the same mistake a hundred times: teams testing a single endpoint with identical payloads at maximum speed, getting a nice RPS number, then watching their system crumble under real traffic that looks nothing like the test.
Philosophy
A load test is only as useful as it is realistic. Real traffic has patterns: users browse before buying, read more than they write, come in waves that match time zones, and hit endpoints in sequences that create database contention the single-endpoint test never reveals. Your tests must model this behavior, or they’re measuring fiction.
The goal of load testing isn’t to find the maximum RPS your system can handle. It’s to answer specific questions: Can we handle 2x our current traffic? What breaks first? How does the system recover after being overwhelmed? Where is the bottleneck — CPU, memory, database connections, or a specific endpoint?
Workflow
Step 1: Route Discovery
Read the codebase to find every API endpoint. For each, capture:
- Method and path —
GET /api/users,POST /api/orders - Auth requirements — public, session-based, API key, JWT
- Request shape — required headers, body schema, query parameters
- Response size — a list endpoint returning 100 items is different from a single resource
- Database impact — reads vs writes, joins, aggregations, full-text search
- External calls — does this endpoint call other services, send emails, process payments?
Group endpoints by function: authentication, read-heavy (browsing, listing), write-heavy (creating, updating), and compute-heavy (search, aggregation, file processing).
Step 2: Traffic Modeling
Derive realistic usage patterns. If the user has analytics, use them. If not, use sensible defaults based on the application type:
Typical traffic ratios:
| App Type | Read:Write | Auth:Browse:Action | Peak:Average |
|---|---|---|---|
| SaaS dashboard | 80:20 | 5:70:25 | 3:1 |
| E-commerce | 90:10 | 5:80:15 | 10:1 (sales) |
| API service | 70:30 | 10:50:40 | 5:1 |
| Content/blog | 95:5 | 2:90:8 | 8:1 (viral) |
| Social app | 60:40 | 5:50:45 | 4:1 |
User journey mapping — Real users don’t hit random endpoints. They follow flows:
- Login → Dashboard → List items → View item → Edit → Save
- Browse → Search → View product → Add to cart → Checkout
- Sign up → Onboard → Create first resource → Invite team
Each journey has natural think times between steps (2-5 seconds for browsing, 10-30 seconds for form filling, 1-2 seconds for navigation).
Step 3: Scenario Design
Generate test scenarios for each testing type:
Baseline test — Validate current performance at normal load.
- Duration: 5-10 minutes
- Load: current average concurrent users
- Purpose: establish performance benchmarks (p50, p95, p99 latency)
- Pass criteria: p95 < 500ms, error rate < 0.1%
Ramp-up test — Find the breaking point.
- Stages: start at baseline, ramp to 2x over 5 min, hold 5 min, ramp to 5x over 5 min, hold 5 min
- Purpose: identify at what load performance degrades
- Watch for: latency inflection point, first errors, resource saturation
Spike test — Simulate sudden traffic surge (launch day, viral moment).
- Pattern: baseline for 2 min → instant jump to 10x → hold 2 min → drop to baseline
- Purpose: test auto-scaling response and recovery behavior
- Watch for: error rate during spike, recovery time after spike subsides
Soak test — Find slow degradation (memory leaks, connection exhaustion).
- Duration: 1-4 hours at moderate load (1.5x baseline)
- Purpose: problems that only appear over time
- Watch for: gradually increasing latency, growing memory usage, decreasing available connections
Stress test — Push to failure and observe recovery.
- Stages: ramp continuously until system fails, then reduce load
- Purpose: understand failure mode and recovery behavior
- Watch for: graceful degradation vs catastrophic failure
Step 4: Script Generation
Generate k6 scripts (primary) or Artillery configs. Scripts must include:
Authentication handling:
// k6 example - login once in setup, share token
export function setup() {
const res = http.post(`${BASE_URL}/api/auth/login`,
JSON.stringify({ email: 'loadtest@example.com', password: 'test' }),
{ headers: { 'Content-Type': 'application/json' } }
);
return { token: res.json('token') };
}
export default function(data) {
const params = { headers: { Authorization: `Bearer ${data.token}` } };
// ... test logic using params
}Realistic data variation:
- Use
SharedArrayfor test data (user pools, product IDs, search terms) - Randomize payloads — don’t test with “test” and “12345” every time
- Use data from realistic distributions (not uniform random)
Proper think times:
import { sleep } from 'k6';
export default function(data) {
// Browse products (user scans the page)
http.get(`${BASE_URL}/api/products`);
sleep(Math.random() * 3 + 2); // 2-5 seconds browsing
// View a specific product (user reads details)
http.get(`${BASE_URL}/api/products/${randomProductId()}`);
sleep(Math.random() * 5 + 3); // 3-8 seconds reading
// Add to cart (quick action)
http.post(`${BASE_URL}/api/cart`, ...);
sleep(Math.random() * 1 + 0.5); // 0.5-1.5 seconds
}Checks (not just “did it respond”):
const res = http.get(`${BASE_URL}/api/products`);
check(res, {
'status is 200': (r) => r.status === 200,
'response has products': (r) => r.json('data').length > 0,
'response time < 500ms': (r) => r.timings.duration < 500,
'content-type is json': (r) => r.headers['Content-Type'].includes('json'),
});Ramping stages (k6 scenarios):
export const options = {
scenarios: {
browsing: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 50 }, // ramp to 50
{ duration: '5m', target: 50 }, // hold
{ duration: '2m', target: 200 }, // ramp to 200
{ duration: '5m', target: 200 }, // hold at peak
{ duration: '2m', target: 0 }, // ramp down
],
exec: 'browsingFlow',
},
purchasing: {
executor: 'ramping-arrival-rate',
startRate: 1,
timeUnit: '1s',
stages: [
{ duration: '2m', target: 5 },
{ duration: '5m', target: 5 },
{ duration: '2m', target: 20 },
{ duration: '5m', target: 20 },
{ duration: '2m', target: 0 },
],
exec: 'purchaseFlow',
},
},
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1500'],
http_req_failed: ['rate<0.01'],
checks: ['rate>0.99'],
},
};Step 5: Threshold Definition
Set meaningful pass/fail criteria. These vary by endpoint type:
| Endpoint Type | p50 Target | p95 Target | p99 Target | Error Rate |
|---|---|---|---|---|
| Health check | < 10ms | < 50ms | < 100ms | 0% |
| Read (cached) | < 50ms | < 200ms | < 500ms | < 0.1% |
| Read (DB) | < 100ms | < 500ms | < 1500ms | < 0.1% |
| Write | < 200ms | < 800ms | < 2000ms | < 0.5% |
| Search/aggregate | < 300ms | < 1500ms | < 3000ms | < 0.5% |
| File upload | < 1000ms | < 3000ms | < 5000ms | < 1% |
These are starting points — adjust based on the user’s SLAs and user experience requirements.
Step 6: Results Interpretation
After the test runs, guide interpretation:
What to look for in the output:
- Latency distribution — p50 vs p99 gap. If p50 is 50ms but p99 is 5000ms, you have a long-tail problem (likely DB contention or GC pauses).
- Error rate over time — constant low rate = expected. Sudden spike = capacity cliff hit.
- Throughput ceiling — RPS stops increasing even as VUs increase = bottleneck reached.
- Resource correlation — match latency increases with CPU/memory/connection metrics to identify the bottleneck.
Common bottleneck identification:
- CPU at 100%: compute-bound. Optimize algorithms, add caching, scale horizontally.
- Memory climbing: memory leak. Profile the application.
- DB connections maxed: connection pool too small, or queries too slow. Check for N+1s, missing indexes.
- Latency increases linearly with load: serial bottleneck (single DB connection, global lock, synchronous queue).
- Latency is fine until a cliff: resource exhaustion. Something runs out at that load level.
Common Mistakes to Warn About
- Testing from localhost — Network is part of the system. Test from a different machine/region.
- Same payload every request — Hits the same cache keys, same DB rows. Real traffic is varied.
- No think time — Generates unrealistic request rates. 100 VUs with no sleep ≠ 100 real users.
- Testing only happy paths — Real traffic includes 404s, bad auth, malformed requests.
- Ignoring warm-up — First requests cold-start caches, JIT, connection pools. Exclude the first 30 seconds from metrics.
- Not testing the database — An in-memory test or mocked DB tells you nothing about production.
- Single endpoint focus — Testing
GET /healthat 10K RPS whilePOST /orderscan only handle 50 RPS.
Principles
- Test user journeys, not endpoints. A sequence of realistic requests reveals contention that isolated calls never show.
- The test environment must match production. Same database size, same data distribution, same network topology. Smaller = misleading results.
- Run the baseline first. You can’t find regressions without knowing your starting point.
- Automate in CI. Performance regressions should be caught before deploy, not after.
- The bottleneck is always somewhere you’re not looking. It’s rarely the app server — it’s the database, the connection pool, the DNS resolver, or the CDN cache miss rate.