Public JSON API

Free, static, CORS-friendly JSON for everything localllm-advisor knows about: GPU compatibility, ranked recommendations per use case, per-model VRAM tables, the tier list. CDN-served, no auth, no rate limits, no scraping.

Static / CDN-cachedCORS: *CC BY 4.0Schema v1.0.0

Quick start

Every endpoint is a plain JSON file under https://localllm-advisor.com/api/v1. Hit it from any HTTP client — browser, fetch, requests, curl, even an LLM tool-call.

curl

# Top coding models for an RTX 4090
curl -s https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090/coding.json | jq '.recommendations[:5]'

# Which GPUs can run Qwen 3 32B?
curl -s https://localllm-advisor.com/api/v1/model/qwen-3-32b-dense.json | jq '.runnable_on[:5]'

JavaScript / TypeScript

const res = await fetch(
  "https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090/coding.json"
);
const data = await res.json();

for (const m of data.recommendations.slice(0, 5)) {
  console.log(`${m.name}  ${m.quant}  ~${m.estimated_tps} tok/s`);
}

Python

import requests

# Get the top coding models for an RTX 4090
r = requests.get("https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090/coding.json")
data = r.json()

print(f"Top 5 coding models on the {data['gpu']['name']}:")
for m in data["recommendations"][:5]:
    print(f"  {m['name']:30s} {m['quant']:8s} ~{m['estimated_tps']} tok/s")

Endpoints

GEThttps://localllm-advisor.com/api/v1/index.json

Top-level registry — version, generated_at, list of every endpoint with descriptions.

GEThttps://localllm-advisor.com/api/v1/models.json

Lightweight directory of every model in the dataset (id, slug, family, params, benchmarks).

GEThttps://localllm-advisor.com/api/v1/gpus.json

Lightweight directory of every GPU we have specs for (vram, bandwidth, vendor, tdp, price).

GEThttps://localllm-advisor.com/api/v1/tier-list.json

Curated tier list (S/A/B/C/D) of canonical models grouped by VRAM ceiling. Same data the /tier-list page renders.

GEThttps://localllm-advisor.com/api/v1/gpu/{slug}.json

Per-GPU summary: top recommendations across every use case.

try: https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090.json

GEThttps://localllm-advisor.com/api/v1/gpu/{slug}/{useCase}.json

Per-(GPU, use case) ranked recommendations. Use cases: chat, coding, reasoning, creative.

try: https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090/coding.json

GEThttps://localllm-advisor.com/api/v1/model/{slug}.json

Per-model: full quantization table + which popular GPUs can run it (with tps estimates).

try: https://localllm-advisor.com/api/v1/model/qwen-3-32b-dense.json

Sample response

https://localllm-advisor.com/api/v1/gpu/nvidia-rtx-4090/coding.json

{
  "schema": "[email protected]",
  "generated_at": "2026-04-25T12:00:00.000Z",
  "gpu": {
    "slug": "nvidia-rtx-4090",
    "name": "NVIDIA RTX 4090",
    "vendor": "nvidia",
    "vram_mb": 24576,
    "bandwidth_gbps": 1008,
    "tdp_watts": 450,
    "price_usd": 1999
  },
  "use_case": "coding",
  "benchmark_channels": ["bigcodebench","humaneval","math","ifeval"],
  "count": 25,
  "recommendations": [
    {
      "id": "qwen3-coder-30b-a3b",
      "slug": "qwen3-coder-30b-a3b",
      "name": "Qwen3-Coder 30B A3B",
      "family": "qwen",
      "params_b": 30,
      "architecture": "moe",
      "quality_score": 70,
      "quant": "Q4_K_M",
      "bpw": 4.83,
      "vram_mb": 19200,
      "vram_pct": 78.1,
      "estimated_tps": 52
    }
    // ...
  ]
}

License & attribution

The dataset is licensed under CC BY 4.0. You may use it commercially, redistribute it, build derivatives — we ask only that you link back to localllm-advisor.com so users can find updates and methodology.

Stability & versioning

The URL prefix /api/v1/ is the contract. Breaking changes (renamed fields, removed endpoints) bump the major version (/api/v2/) and we run both for at least 6 months. Backwards-compatible additions (new optional fields, new endpoints) land on v1 without a bump.

Each response carries a schema field (e.g. [email protected]) so consumers can branch on the version if needed.

Methodology

Compatibility uses the same fitting heuristic the rest of the site does (largest fitting quant under 85% of VRAM). Ranking inside each (GPU, use case) endpoint is a benchmark-channel composite with a completeness penalty so partially-benchmarked models can't cherry-pick. Full details on the methodology page.

Stay ahead of local AI