Benchmarks / Benchmarks

Benchmark Setup

Browse docs

--- title: "Benchmark Setup" description: "Run reproducible Aurora vs Bifrost side-by-side benchmarks using oha or the official benchmark binary." icon: "gauge" ---

Overview

The benchmark suite in scripts/ lets you compare Aurora against Bifrost (and optionally LiteLLM) on the same machine with identical request load. All gateways forward to a shared mock OpenAI-compatible backend so the test measures gateway overhead, not provider latency.

Two load-generator modes are available:

oha mode (default when oha is installed) â€” concurrency-based load, no rate limiting, tests max throughput
Official benchmark mode â€” rate-controlled load using the maximhq/bifrost-benchmarking Go binary

Prerequisites

Requirement	Version	Check
Go	1.26+	`go version`
oha	any	`oha --version` (optional, for oha mode)
Git	any	`git --version`

The benchmark script auto-detects oha on PATH. If present it runs in oha mode unless -OhaMode:$false is passed.

Scripts

All benchmark scripts live under scripts/benchmarks/:

File	Purpose
`scripts/run-benchmark.ps1`	Entry point â€” starts mock server, Bifrost, Aurora, runs benchmark, generates comparison
`scripts/benchmarks/scenarios/bifrost-side-by-side.ps1`	Orchestrator â€” manages process lifecycle, preflight checks, warmup, profiling
`scripts/benchmarks/mock-server/main.go`	Lightweight OpenAI-compatible mock returning pre-encoded chat responses
`scripts/benchmarks/modules/BenchmarkComparison.psm1`	Delta calculation and summary tables
`scripts/benchmarks/modules/BenchmarkInfrastructure.psm1`	Process management, build helpers, health checks
`scripts/benchmarks/modules/BenchmarkProfiling.psm1`	pprof capture for Aurora CPU/heap/trace/goroutine

Quick Start

Run a smoke test (500 req/s for 30s) to verify everything works:

powershell

.\scripts\run-benchmark.ps1 -Mode smoke -NoWindow

This runs Bifrost then Aurora, each handling 15,000 requests against the mock server. Results land in bench-results/bifrost-side-by-side/.

Modes

Mode	Rate	Duration	Use Case
`smoke`	500 req/s	30s	Quick validation that both gateways work
`publish`	5,000 req/s	60s	Heavier throughput test
`custom`	`-Rate` / `-Duration`	as specified	Tuning and exploration

Running a Throughput Test (oha Mode)

oha sends requests at maximum speed without rate limiting â€” good for finding the real throughput ceiling.

powershell

.\scripts\run-benchmark.ps1 -Mode custom -Rate 10000 -Duration 10 -NoWindow -OhaMode -OhaRequests 100000 -OhaConcurrency @(100, 300, 500)

This sends 100,000 requests at each concurrency level to each gateway (600,000 total).

Concurrency Levels

Level	What It Tests
`100`	Light concurrency, low contention
`300`	Moderate concurrency, normal proxy load
`500`	High concurrency, saturating the gateway
`1000`	Extreme concurrency, scheduler contention
`2000`	Saturation â€” connection refusals expected

Running a Rate-Limited Test (Official Benchmark)

Uses the benchmark.exe binary from maximhq/bifrost-benchmarking. This mode provides detailed per-gateway metrics with delta comparison.

powershell

.\scripts\run-benchmark.ps1 -Mode custom -Rate 1000 -Duration 60 -NoWindow

The comparison JSON includes throughput, latency percentiles, memory usage, and pprof profiles.

Protocol Notes

The benchmark applies HTTP/1.1 load to both gateways for fair comparison:

Bifrost receives HTTP/1.1 requests (its standard mode)
Aurora receives HTTP/1.1 requests at the load-generator level
Aurora's upstream (to mock server) uses h2c (HTTP/2 cleartext) â€” this is its production default and provides better throughput via connection multiplexing

Understanding the Output

oha Mode Summary

code

  bifrost         c=500 â†’ 11408 req/s  100% success  avg=43.6 ms  p50=37.1 ms  p99=142.6 ms
  aurora          c=500 â†’  7517 req/s  100% success  avg=66.0 ms  p50=63.7 ms  p99=131.4 ms

Key fields:

req/s â€” throughput achieved at that concurrency
success â€” percentage of HTTP 200 responses
p50/p99 â€” median and tail latency

Official Benchmark Comparison

When using the official benchmark binary (non-oha mode), a detailed comparison JSON and summary table is generated:

text

  Metric                                      Bifrost             Aurora              Delta
  ----------------------------------------------------------------------------------------
  Throughput (req/s)                           500.03             500.03               0.00
  Mean latency (ms)                              0.46               1.04               0.59
  P50 latency (ms)                               0.03               0.51               0.47
  P99 latency (ms)                               5.02              19.04              14.02
  Peak memory (MB)                             144.88              52.45             -92.43

Interpreting Aurora vs Bifrost Differences

Signal	What It Means
Bifrost lower P50	Rust's zero-cost abstractions for request forwarding
Aurora lower memory	Go runtime with GOGC=200 and minimal bench mode stays under 65 MB even at 10k rps
Aurora zero errors	Graceful backpressure â€” connection refusals only, no corrupt requests
Bifrost HTTP 400s	Under extreme saturation, in-flight requests can get corrupted

Interpreting pprof Profiles

Aurora captures CPU, heap, trace, and goroutine pprof profiles during the benchmark:

code

bench-results/bifrost-side-by-side/<timestamp>.aurora.cpu.pprof
bench-results/bifrost-side-by-side/<timestamp>.aurora.heap.pprof

Analyze them with:

powershell

go tool pprof -top -seconds 0 <path>.cpu.pprof
go tool pprof -top -alloc_space <path>.heap.pprof
go tool pprof -top -inuse_space <path>.heap.pprof

Common Hotspots

Profile Signal	Likely Cause
`runtime.cgocall` high	Windows syscall overhead from network I/O â€” expected on Windows
`encoding/json.(*decodeState).object`	JSON unmarshal â€” each request body is decoded twice (once for model validation peek, once for canonical decode)
`net/http.(*http2clientStream).writeRequest`	h2c upstream multiplexing â€” framing overhead, unavoidable with multiplexing
`runtime.schedule` / `runtime.findRunnable`	Go scheduler contention at high concurrency (1000+ connections)

Typical Results

Low Concurrency (500 req/s rate-limited)

Metric	Bifrost	Aurora
Throughput	500 req/s	500 req/s
P50 latency	0.03 ms	0.51 ms
P99 latency	5 ms	19 ms
Peak memory	145 MB	52 MB

High Concurrency (oha, c=500)

Metric	Bifrost	Aurora
Throughput	~9,500 req/s	~7,500 req/s
P50 latency	~42 ms	~45 ms
P99 latency	~224 ms	~111 ms
Peak memory	~447 MB	~52 MB

Extreme Concurrency (oha, c=1000)

Aurora maintains 100% success rate while Bifrost starts dropping connections:

Metric	Bifrost	Aurora
Throughput	~8,000 req/s	~7,700 req/s
Errors	~56 conn refused	0
P99 latency	~581 ms	~442 ms

Troubleshooting

Symptom	Likely Fix
`bifrost binary not found`	Download the Bifrost prerelease binary and place it in your `PATH`
`port already in use`	Previous run didn't clean up â€” kill the gateway processes
`Aurora model registry not ready`	Check `OPENAI_BASE_URL` points to the mock server and mock is healthy
`oha not found`	Install via `scoop install oha` or `winget install oha`
Results differ across runs	Expected â€” benchmark variance from OS scheduling, background processes, and CPU throttling. Run 3 times and average.

Reference

[scripts/benchmarks/README.md](https://github.com/aurorallm/aurora-enterprise/tree/main/scripts/benchmarks/README.md) â€” technical details on benchmark modules
[docs/benchmarks/BIFROST_SIDE_BY_SIDE_BENCHMARKS.md](https://github.com/aurorallm/aurora-enterprise/tree/main/docs/benchmarks/BIFROST_SIDE_BY_SIDE_BENCHMARKS.md) â€” detailed methodology and historical results
[/docs/about/benchmarks](/docs/about/benchmarks) â€” summary of published Aurora benchmark data

← All docs

Benchmarks / Benchmarks

Benchmark Setup

Browse docs

--- title: "Benchmark Setup" description: "Run reproducible Aurora vs Bifrost side-by-side benchmarks using oha or the official benchmark binary." icon: "gauge" ---

Overview

Two load-generator modes are available:

oha mode (default when oha is installed) â€” concurrency-based load, no rate limiting, tests max throughput
Official benchmark mode â€” rate-controlled load using the maximhq/bifrost-benchmarking Go binary

Prerequisites

Requirement	Version	Check
Go	1.26+	`go version`
oha	any	`oha --version` (optional, for oha mode)
Git	any	`git --version`

The benchmark script auto-detects oha on PATH. If present it runs in oha mode unless -OhaMode:$false is passed.

Scripts

All benchmark scripts live under scripts/benchmarks/:

File	Purpose
`scripts/run-benchmark.ps1`	Entry point â€” starts mock server, Bifrost, Aurora, runs benchmark, generates comparison
`scripts/benchmarks/scenarios/bifrost-side-by-side.ps1`	Orchestrator â€” manages process lifecycle, preflight checks, warmup, profiling
`scripts/benchmarks/mock-server/main.go`	Lightweight OpenAI-compatible mock returning pre-encoded chat responses
`scripts/benchmarks/modules/BenchmarkComparison.psm1`	Delta calculation and summary tables
`scripts/benchmarks/modules/BenchmarkInfrastructure.psm1`	Process management, build helpers, health checks
`scripts/benchmarks/modules/BenchmarkProfiling.psm1`	pprof capture for Aurora CPU/heap/trace/goroutine

Quick Start

Run a smoke test (500 req/s for 30s) to verify everything works:

powershell

.\scripts\run-benchmark.ps1 -Mode smoke -NoWindow

This runs Bifrost then Aurora, each handling 15,000 requests against the mock server. Results land in bench-results/bifrost-side-by-side/.

Modes

Mode	Rate	Duration	Use Case
`smoke`	500 req/s	30s	Quick validation that both gateways work
`publish`	5,000 req/s	60s	Heavier throughput test
`custom`	`-Rate` / `-Duration`	as specified	Tuning and exploration

Running a Throughput Test (oha Mode)

oha sends requests at maximum speed without rate limiting â€” good for finding the real throughput ceiling.

powershell

.\scripts\run-benchmark.ps1 -Mode custom -Rate 10000 -Duration 10 -NoWindow -OhaMode -OhaRequests 100000 -OhaConcurrency @(100, 300, 500)

This sends 100,000 requests at each concurrency level to each gateway (600,000 total).

Concurrency Levels

Level	What It Tests
`100`	Light concurrency, low contention
`300`	Moderate concurrency, normal proxy load
`500`	High concurrency, saturating the gateway
`1000`	Extreme concurrency, scheduler contention
`2000`	Saturation â€” connection refusals expected

Running a Rate-Limited Test (Official Benchmark)

Uses the benchmark.exe binary from maximhq/bifrost-benchmarking. This mode provides detailed per-gateway metrics with delta comparison.

powershell

.\scripts\run-benchmark.ps1 -Mode custom -Rate 1000 -Duration 60 -NoWindow

The comparison JSON includes throughput, latency percentiles, memory usage, and pprof profiles.

Protocol Notes

The benchmark applies HTTP/1.1 load to both gateways for fair comparison:

Bifrost receives HTTP/1.1 requests (its standard mode)
Aurora receives HTTP/1.1 requests at the load-generator level
Aurora's upstream (to mock server) uses h2c (HTTP/2 cleartext) â€” this is its production default and provides better throughput via connection multiplexing

Understanding the Output

oha Mode Summary

code

  bifrost         c=500 â†’ 11408 req/s  100% success  avg=43.6 ms  p50=37.1 ms  p99=142.6 ms
  aurora          c=500 â†’  7517 req/s  100% success  avg=66.0 ms  p50=63.7 ms  p99=131.4 ms

Key fields:

req/s â€” throughput achieved at that concurrency
success â€” percentage of HTTP 200 responses
p50/p99 â€” median and tail latency

Official Benchmark Comparison

When using the official benchmark binary (non-oha mode), a detailed comparison JSON and summary table is generated:

text

  Metric                                      Bifrost             Aurora              Delta
  ----------------------------------------------------------------------------------------
  Throughput (req/s)                           500.03             500.03               0.00
  Mean latency (ms)                              0.46               1.04               0.59
  P50 latency (ms)                               0.03               0.51               0.47
  P99 latency (ms)                               5.02              19.04              14.02
  Peak memory (MB)                             144.88              52.45             -92.43

Interpreting Aurora vs Bifrost Differences

Signal	What It Means
Bifrost lower P50	Rust's zero-cost abstractions for request forwarding
Aurora lower memory	Go runtime with GOGC=200 and minimal bench mode stays under 65 MB even at 10k rps
Aurora zero errors	Graceful backpressure â€” connection refusals only, no corrupt requests
Bifrost HTTP 400s	Under extreme saturation, in-flight requests can get corrupted

Interpreting pprof Profiles

Aurora captures CPU, heap, trace, and goroutine pprof profiles during the benchmark:

code

bench-results/bifrost-side-by-side/<timestamp>.aurora.cpu.pprof
bench-results/bifrost-side-by-side/<timestamp>.aurora.heap.pprof

Analyze them with:

powershell

go tool pprof -top -seconds 0 <path>.cpu.pprof
go tool pprof -top -alloc_space <path>.heap.pprof
go tool pprof -top -inuse_space <path>.heap.pprof

Common Hotspots

Profile Signal	Likely Cause
`runtime.cgocall` high	Windows syscall overhead from network I/O â€” expected on Windows
`encoding/json.(*decodeState).object`	JSON unmarshal â€” each request body is decoded twice (once for model validation peek, once for canonical decode)
`net/http.(*http2clientStream).writeRequest`	h2c upstream multiplexing â€” framing overhead, unavoidable with multiplexing
`runtime.schedule` / `runtime.findRunnable`	Go scheduler contention at high concurrency (1000+ connections)

Typical Results

Low Concurrency (500 req/s rate-limited)

Metric	Bifrost	Aurora
Throughput	500 req/s	500 req/s
P50 latency	0.03 ms	0.51 ms
P99 latency	5 ms	19 ms
Peak memory	145 MB	52 MB

High Concurrency (oha, c=500)

Metric	Bifrost	Aurora
Throughput	~9,500 req/s	~7,500 req/s
P50 latency	~42 ms	~45 ms
P99 latency	~224 ms	~111 ms
Peak memory	~447 MB	~52 MB

Extreme Concurrency (oha, c=1000)

Aurora maintains 100% success rate while Bifrost starts dropping connections:

Metric	Bifrost	Aurora
Throughput	~8,000 req/s	~7,700 req/s
Errors	~56 conn refused	0
P99 latency	~581 ms	~442 ms

Troubleshooting

Symptom	Likely Fix
`bifrost binary not found`	Download the Bifrost prerelease binary and place it in your `PATH`
`port already in use`	Previous run didn't clean up â€” kill the gateway processes
`Aurora model registry not ready`	Check `OPENAI_BASE_URL` points to the mock server and mock is healthy
`oha not found`	Install via `scoop install oha` or `winget install oha`
Results differ across runs	Expected â€” benchmark variance from OS scheduling, background processes, and CPU throttling. Run 3 times and average.

Reference

[scripts/benchmarks/README.md](https://github.com/aurorallm/aurora-enterprise/tree/main/scripts/benchmarks/README.md) â€” technical details on benchmark modules
[docs/benchmarks/BIFROST_SIDE_BY_SIDE_BENCHMARKS.md](https://github.com/aurorallm/aurora-enterprise/tree/main/docs/benchmarks/BIFROST_SIDE_BY_SIDE_BENCHMARKS.md) â€” detailed methodology and historical results
[/docs/about/benchmarks](/docs/about/benchmarks) â€” summary of published Aurora benchmark data