Aurora vs Bifrost vs LiteLLM
Enterprise-grade performance comparison across three AI gateways. Built in Go for maximum throughput and minimal latency. See the numbers that matter.
| Metric | Aurora | Bifrost | LiteLLM |
|---|---|---|---|
| Successful Requests(+17,555) | 192,026▲ | 174,471 | — |
| Throughput (external 5000 RPS)(60s side-by-side) | 3,130/s▲ | 2,888/s | 475/s |
| Success Rate (external 5000 RPS)(overload) | 64.01%▲ | 58.16% | 88.78% |
| P50 Latency (external 5000 RPS)(completed requests) | 709 ms▲ | 730 ms | 38.65 s |
| P99 Latency (external 5000 RPS)(+48 ms) | 2.155 s | 2.107 s▲ | 90.72 s |
| Memory Usage (external peak)(5000 RPS) | 59 MB▲ | 113 MB | 372 MB |
| Language | Go | Go | Python |
Aurora and Bifrost values are from the same external side-by-side run. LiteLLM values are retained from published reference data where available.
Same load generator, same host, same OpenAI-compatible mock upstream, same payload, 60 seconds, 25 warmup requests, and 256-request prewarm.
| Metric | Aurora | Bifrost |
|---|---|---|
| Successful requests | 192,026 | 174,471 |
| Success rate | 64.01% | 58.16% |
| Throughput | 3,130.18/s | 2,888.09/s |
| Mean latency | 833.14 ms | 912.47 ms |
| P50 latency | 709.16 ms | 729.60 ms |
| P99 latency | 2.155 s | 2.107 s |
| Peak memory | 59.47 MB | 113.11 MB |
Results from 20260607-181930.comparison.json. Aurora ran with request logging, passthrough routes, passthrough semantic enrichment, and eager body snapshots disabled for a fair minimal hot path.
Run the same side-by-side matrix after starting Bifrost, the mock upstream, and Aurora on ports 8080 and 8081.
$dir = "E:\code-titan-vs-extention\Unified_Gateway\Aurora_Gateway\bench-results\bifrost-side-by-side"; if (-not (Test-Path -LiteralPath $dir)) { throw "Missing results directory: $dir" }; Remove-Item -Path (Join-Path $dir "*") -Force -Recurse; if ($?) { .\scripts\bifrost-side-by-side.ps1 -BifrostPort 8080 -AuroraPort 8081 -Rate 5000 -Duration 60 -Timeout 240 -Model "<benchmark model selector>" -CaptureAuroraProfiles -PrewarmRequests 256 -PrewarmConcurrency 64 }| Feature | Aurora | Bifrost | LiteLLM |
|---|---|---|---|
| Language | Go (Compiled) | Go (Compiled) | Python (Interpreted) |
| Async Runtime | Goroutines | Goroutines | asyncio (GIL-bound) |
| HTTP Server | Echo v5 (Fast) | Fast HTTP | FastAPI / Uvicorn |
| Memory Model | Efficient GC + Carrier Pattern | Efficient GC | GC-managed (high overhead) |
| Concurrency | Native goroutines | Native goroutines | GIL-limited |
| JSON Parsing | gJSON (zero-alloc) | Standard | Standard (stdlib) |
| Context Allocation | 1 allocation (Carrier) | Per-call WithValue | Per-call WithValue |
| Open Source | Yes (Apache 2.0) | Yes (Apache 2.0) | Yes (MIT) |
How were the benchmarks conducted?+
Why is Aurora faster than LiteLLM?+
How does Aurora compare to Bifrost?+
What is 'gateway overhead'?+
Can Aurora handle higher throughput than shown?+
What testing methodology was used?+
Run Your Own Benchmarks
Aurora's side-by-side benchmark tooling is open source and included in the repository. Clone the repo and run scripts/bifrost-side-by-side.ps1 to reproduce these results.