[ Performance Benchmarks ]

Aurora vs Bifrost vs LiteLLM

Enterprise-grade performance comparison across three AI gateways. Built in Go for maximum throughput and minimal latency. See the numbers that matter.

[ Performance at a Glance ]

Success Rate64.01%5000 RPS side-by-side

Aurora: 64.01%Bifrost: 58.16%LiteLLM: —

External Throughput3,130/s5000 RPS side-by-side

Aurora: 3,130/sBifrost: 2,888/sLiteLLM: 475/s

External P50709ms5000 RPS overload test

Aurora: 709msBifrost: 730msLiteLLM: 38.65s

External Peak Memory59 MB47% less than Bifrost

Aurora: 59 MBBifrost: 113 MBLiteLLM: 372 MB

[ Detailed Metrics ]

Metric	Aurora	Bifrost	LiteLLM
Successful Requests(+17,555)	192,026▲	174,471	—
Throughput (external 5000 RPS)(60s side-by-side)	3,130/s▲	2,888/s	475/s
Success Rate (external 5000 RPS)(overload)	64.01%▲	58.16%	88.78%
P50 Latency (external 5000 RPS)(completed requests)	709 ms▲	730 ms	38.65 s
P99 Latency (external 5000 RPS)(+48 ms)	2.155 s	2.107 s▲	90.72 s
Memory Usage (external peak)(5000 RPS)	59 MB▲	113 MB	372 MB
Language	Go	Go	Python

Aurora and Bifrost values are from the same external side-by-side run. LiteLLM values are retained from published reference data where available.

[ External Side-by-Side — 5000 Requested RPS ]

Same load generator, same host, same OpenAI-compatible mock upstream, same payload, 60 seconds, 25 warmup requests, and 256-request prewarm.

Metric	Aurora	Bifrost
Successful requests	192,026	174,471
Success rate	64.01%	58.16%
Throughput	3,130.18/s	2,888.09/s
Mean latency	833.14 ms	912.47 ms
P50 latency	709.16 ms	729.60 ms
P99 latency	2.155 s	2.107 s
Peak memory	59.47 MB	113.11 MB

Results from 20260607-181930.comparison.json. Aurora ran with request logging, passthrough routes, passthrough semantic enrichment, and eager body snapshots disabled for a fair minimal hot path.

[ Reproduction Command ]

Run the same side-by-side matrix after starting Bifrost, the mock upstream, and Aurora on ports 8080 and 8081.

$dir = "E:\code-titan-vs-extention\Unified_Gateway\Aurora_Gateway\bench-results\bifrost-side-by-side"; if (-not (Test-Path -LiteralPath $dir)) { throw "Missing results directory: $dir" }; Remove-Item -Path (Join-Path $dir "*") -Force -Recurse; if ($?) { .\scripts\bifrost-side-by-side.ps1 -BifrostPort 8080 -AuroraPort 8081 -Rate 5000 -Duration 60 -Timeout 240 -Model "<benchmark model selector>" -CaptureAuroraProfiles -PrewarmRequests 256 -PrewarmConcurrency 64 }

[ Test Environment ]

Host

Windows 10

Cores

8 Logical

Duration

60s

Target Rate

5000 RPS

Warmup

25 + 256 prewarm

Model

Configured benchmark selector

Upstream

Mock OpenAI

Profiles

CPU / Heap / Trace

[ Why Aurora is Faster ]

Feature	Aurora	Bifrost	LiteLLM
Language	Go (Compiled)	Go (Compiled)	Python (Interpreted)
Async Runtime	Goroutines	Goroutines	asyncio (GIL-bound)
HTTP Server	Echo v5 (Fast)	Fast HTTP	FastAPI / Uvicorn
Memory Model	Efficient GC + Carrier Pattern	Efficient GC	GC-managed (high overhead)
Concurrency	Native goroutines	Native goroutines	GIL-limited
JSON Parsing	gJSON (zero-alloc)	Standard	Standard (stdlib)
Context Allocation	1 allocation (Carrier)	Per-call WithValue	Per-call WithValue
Open Source	Yes (Apache 2.0)	Yes (Apache 2.0)	Yes (MIT)

[ Frequently Asked Questions ]

How were the benchmarks conducted?+

The latest Aurora-vs-Bifrost matrix uses the same external HTTP load generator against both gateways on one Windows machine, one mocked OpenAI-compatible upstream, 5000 requested RPS, 60 seconds, 25 serial warmup requests, and a 256-request prewarm. Aurora was run with optional benchmark-mode work disabled: request logging, eager request-body snapshots, passthrough semantic enrichment, and passthrough routes.

Why is Aurora faster than LiteLLM?+

Aurora is built in Go, which compiles to native machine code and uses goroutines for lightweight concurrency. LiteLLM is Python-based and subject to the Global Interpreter Lock (GIL), asyncio overhead, and significantly higher memory consumption. Additionally, Aurora's RequestCarrier pattern eliminates per-request context allocations entirely.

How does Aurora compare to Bifrost?+

In the latest external 5000 RPS side-by-side run, Aurora completed 192,026 requests vs Bifrost's 174,471, delivered 3,130 req/s vs 2,888 req/s, and used 59 MB peak memory vs 113 MB. Bifrost kept a slightly lower P99 among completed requests: 2.107s vs Aurora's 2.155s.

What is 'gateway overhead'?+

Gateway overhead is the latency added by the gateway itself. This page now focuses on the external end-to-end side-by-side run because it includes real socket accept/write pressure, upstream forwarding, JSON handling, and profiling overhead on the same host.

Can Aurora handle higher throughput than shown?+

The published matrix is intentionally an external 5000 RPS overload test. Both gateways hit Windows socket admission limits, and Aurora still completed more requests while using less memory. Higher numbers are possible on larger hosts or with different connection settings, but this page reports the measured side-by-side run.

What testing methodology was used?+

The matrix uses an external HTTP load generator against Bifrost and Aurora. Both runs use the same host, same model payload, same mocked OpenAI-compatible upstream, same 60-second duration, same 25 warmup requests, same 256-request prewarm, and saved pprof profiles for Aurora.

Run Your Own Benchmarks

Aurora's side-by-side benchmark tooling is open source and included in the repository. Clone the repo and run scripts/bifrost-side-by-side.ps1 to reproduce these results.

View on GitHub

Metric

Aurora

Bifrost

LiteLLM

Successful Requests(+17,555)

192,026▲

174,471

—

Throughput (external 5000 RPS)(60s side-by-side)

3,130/s▲

2,888/s

475/s

Success Rate (external 5000 RPS)(overload)

64.01%▲

58.16%

88.78%

P50 Latency (external 5000 RPS)(completed requests)

709 ms▲

730 ms

38.65 s

P99 Latency (external 5000 RPS)(+48 ms)

2.155 s

2.107 s▲

90.72 s

Memory Usage (external peak)(5000 RPS)

59 MB▲

113 MB

372 MB

Language

Python

Metric

Aurora

Bifrost

Successful requests

192,026

174,471

Success rate

64.01%

58.16%

Throughput

3,130.18/s

2,888.09/s

Mean latency

833.14 ms

912.47 ms

P50 latency

709.16 ms

729.60 ms

P99 latency

2.155 s

2.107 s

Peak memory

59.47 MB

113.11 MB

$dir = "E:\code-titan-vs-extention\Unified_Gateway\Aurora_Gateway\bench-results\bifrost-side-by-side"; if (-not (Test-Path -LiteralPath $dir)) { throw "Missing results directory: $dir" }; Remove-Item -Path (Join-Path $dir "*") -Force -Recurse; if ($?) { .\scripts\bifrost-side-by-side.ps1 -BifrostPort 8080 -AuroraPort 8081 -Rate 5000 -Duration 60 -Timeout 240 -Model "<benchmark model selector>" -CaptureAuroraProfiles -PrewarmRequests 256 -PrewarmConcurrency 64 }

Feature

Aurora

Bifrost

LiteLLM

Language

Go (Compiled)

Python (Interpreted)

Async Runtime

Goroutines

asyncio (GIL-bound)

HTTP Server

Echo v5 (Fast)

Fast HTTP

FastAPI / Uvicorn

Memory Model

Efficient GC + Carrier Pattern

Efficient GC

GC-managed (high overhead)

Concurrency

Native goroutines

GIL-limited

JSON Parsing

gJSON (zero-alloc)

Standard

Standard (stdlib)

Context Allocation

1 allocation (Carrier)

Per-call WithValue

Open Source

Yes (Apache 2.0)

Yes (MIT)