Browse docs
--- title: "Benchmarks" description: "A summary of Aurora benchmark results, with links to full write-ups and methodology." icon: "gauge" --- ## Benchmark snapshot This page is a short reference for public and reproducible Aurora benchmark work. Use it as a starting point, then rerun the benchmark that matches your deployment shape. The current gateway-overhead and Bifrost side-by-side benchmark procedures are maintained in the repository at: - scripts/benchmarks/README.md — gateway overhead benchmarks - docs/benchmarks/BIFROST_SIDE_BY_SIDE_BENCHMARKS.md — Bifrost comparison benchmarks <Note> Benchmark results are point-in-time data, not universal claims. Gateway performance depends on workload, provider mix, deployment setup, and tuning. </Note> ## Visual snapshot !Benchmark dashboard from the original blog post) This visual comes from the historical Aurora vs LiteLLM benchmark. Prefer the repository benchmark docs above for current methodology. ## At a glance In this benchmark run, Aurora came out ahead on the main operational signals most teams care about: - Added latency - Throughput under concurrency - CPU overhead - Memory overhead ## Test shape The comparison used a simple like-for-like setup: - OpenAI-compatible /v1/chat/completions - The same prompt and request shape on both sides - Concurrency levels of 1, 4, and 8 - A focus on clean runs with 0% errors - Metrics including req/s, latency percentiles, CPU usage, and RSS memory This docs page keeps only the primary comparison matrix from the blog post. ## Reference table | Gateway | Concurrency | Success | Error % | Req/s | p50 ms | p95 ms | p99 ms | CPU avg % | RSS avg MB | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | Aurora | 1 | 12/12 | 0.00 | 9.61 | 86.4 | 141.1 | 144.4 | 0.81 | 45.4 | | Aurora | 4 | 12/12 | 0.00 | 44.66 | 56.1 | 139.5 | 139.5 | 0.23 | 46.0 | | Aurora | 8 | 12/12 | 0.00 | 52.75 | 98.4 | 130.6 | 131.1 | 1.13 | 46.0 | | LiteLLM | 1 | 12/12 | 0.00 | 8.64 | 96.2 | 190.3 | 213.9 | 9.21 | 320.3 | | LiteLLM | 4 | 12/12 | 0.00 | 36.82 | 104.7 | 149.5 | 149.5 | 5.20 | 320.8 | | LiteLLM | 8 | 12/12 | 0.00 | 35.81 | 188.7 | 244.4 | 244.9 | 5.95 | 321.5 | ## Key readouts Some useful reads from that March 5, 2026 run: - Lower p95 latency at every tested concurrency level. - Higher throughput across the benchmark matrix. - 45-46 MB RSS, while LiteLLM stayed near 320-321 MB. - Less CPU in these runs. At the highest tested concurrency, Aurora reached 52.75 req/s versus LiteLLM at 35.81 req/s. ## Current benchmarks For current Aurora gateway-overhead benchmarking, also use the real gateway-stack benchmarks in internal/server/gateway_stack_bench_test.go. These run through server.New(...) and measure middleware, routing, OpenAI chat, Anthropic /v1/messages conversion, streaming conversion, usage/audit hooks, workflows, budget checks, Responses API, and passthrough without external provider noise. ``powershell go test ./internal/server -run "^$" -bench BenchmarkGatewayStack -benchtime 10s -benchmem .\scripts\bench-real-gateway.ps1 -Benchtime 10s -Count 3 ` See scripts/benchmarks/README.md for the current scenario matrix and profiling workflow. ## Historical Aurora vs LiteLLM benchmark The table above comes from an older public Aurora vs LiteLLM comparison. The supporting scripts remain in this repository for reproducibility, but current performance claims should come from freshly-run benchmark artifacts. ### Prerequisites - Go 1.26.2+ - Python 3.10+ with matplotlib and numpy - jq, curl - A Groq API key (or any OpenAI-compatible provider — adjust the script) - litellm[proxy] (pip install "litellm[proxy]") ### Scripts The benchmark suite lives in [docs/about/benchmark-tools/](https://github.com/aurorallm/aurora/tree/main/docs/about/benchmark-tools): | File | Purpose | | --- | --- | | [compare.sh](https://github.com/aurorallm/aurora/blob/main/docs/about/benchmark-tools/compare.sh) | Builds Aurora, starts both gateways, runs the full benchmark matrix, and writes a REPORT.md | | [bench_main.go](https://github.com/aurorallm/aurora/blob/main/docs/about/benchmark-tools/bench_main.go) | Source for the bench CLI that sends requests and collects latency + process metrics | | [plot_benchmark_charts.py](https://github.com/aurorallm/aurora/blob/main/docs/about/benchmark-tools/plot_benchmark_charts.py) | Generates per-metric charts and a combined dashboard from the JSON results | ### Quick start `bash # 1. Clone Aurora and set up your .env with GROQ_API_KEY git clone https://github.com/aurorallm/aurora.git cd aurora echo "GROQ_API_KEY=gsk_..." > .env # 2. Run the full comparison (builds Aurora, starts LiteLLM, benchmarks both) bash docs/about/benchmark-tools/compare.sh # 3. Generate charts from the latest result pip install matplotlib numpy python3 docs/about/benchmark-tools/plot_benchmark_charts.py benchmark-results/<timestamp> ` The script creates a timestamped directory under benchmark-results/ containing JSON result files, gateway logs, and a REPORT.md with the results table. ### Tuning You can override defaults via environment variables: `bash REQUESTS=100 CONCURRENCIES="1 4 8 16" MAX_TOKENS=16 bash docs/about/benchmark-tools/compare.sh ` See the top of compare.sh` for the full list of knobs. ## Why this page is short This page is intentionally shorter and more operational than the blog version. It exists so docs readers can see the benchmark result quickly without reading a full article inside the product docs. If you want the full narrative, more charts, and the original context, use the source post. No single benchmark settles the question for every environment. If you are evaluating gateways seriously, reproduce the test against your own traffic and infrastructure.