Outage hedging
When OpenAI returns 5xx for chat we retry on Anthropic or Gemini through OpenRouter — same request, alternate provider — so a single upstream incident does not reach your users. Spend caps stay enforced through the retry path.
Three lines change. The OpenAI Python and JavaScript SDKs work as-is. Pick Claude, Gemini, GPT-5 or any of 37 chat models — same /v1/chat/completions, streaming SSE, tool calls, vision input, JSON mode. Plus video, image and music endpoints — one credit pool, one Bearer token, one dashboard.
No card required · $10 in credits unlocked the moment you verify your email
If your codebase imports the OpenAI Python or JavaScript package, the migration is a constructor-argument change. Everything downstream — tool calls, streaming iterators, vision content blocks, JSON mode, embeddings — keeps working without modification.
https://api.openai.com/v1
+https://aimarcusimage.eu/api/v1
sk-...
+sk-aig-...
gpt-4o
+claude-sonnet-4-5 · gemini-3-pro · gpt-5-codex · any of 37
curl https://aimarcusimage.eu/api/v1/chat/completions \
-H "Authorization: Bearer sk-aig-..." \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [
{ "role": "user", "content": "Summarize streaming SSE in 3 lines." }
]
}'
from openai import OpenAI
client = OpenAI(
api_key="sk-aig-...",
base_url="https://aimarcusimage.eu/api/v1",
)
response = client.chat.completions.create(
model="claude-sonnet-4-5", # any of 37 priced chat models
messages=[{"role": "user", "content": "Hello."}],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-aig-...",
baseURL: "https://aimarcusimage.eu/api/v1",
});
const r = await client.chat.completions.create({
model: "gpt-5-codex",
messages: [{ role: "user", content: "Say hi." }],
});
console.log(r.choices[0].message.content);
from openai import OpenAI
client = OpenAI(
api_key="sk-aig-...",
base_url="https://aimarcusimage.eu/api/v1",
)
stream = client.chat.completions.create(
model="gemini-3-pro",
messages=[{"role": "user", "content": "Stream a haiku."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
Every row tested against the live OpenAI Python and JavaScript SDKs.
| Feature | Status | Notes |
|---|---|---|
| /v1/chat/completions | Yes | Standard request and response shape, including tool_calls and finish_reason. |
| Streaming SSE (stream=true) | Yes | OpenAI SDK iterator works as-is; canonical data: chunks shipped server-sent. |
| Tool / function calling | Yes | Pass tools[] and tool_choice — handled per-provider, results returned in tool_calls. |
| Vision input (image_url) | Yes | GPT-5, Claude 4.x, Gemini 3 Pro, Llama Vision — same content[] shape. |
| JSON mode (response_format) | Yes | Works on GPT-5 family, Claude 4.x and Gemini 3 Pro out of the box. |
| Embeddings /v1/embeddings | Yes | 5 embedding models including text-embedding-3-large. |
| Image / video / music | Async | Different surface — POST /v1/jobs/createTask + webhook on completion. |
| Per-key spend cap (HTTP 402) | Yes | Enforced before the upstream call; never silent overspend. |
| Assistants / completions (legacy) | No | Not implemented — both APIs are deprecated upstream. |
GPT-5.5, Claude Opus 4.6, Gemini 3 Pro, DeepSeek v3, Mistral Large, GPT-5-Codex
Flux Kontext Pro, Nano Banana Pro, GPT-Image, Ideogram v3, Recraft, Qwen Image
Veo 3.1, Runway Aleph, Seedance 2, Wan 2.7, Kling 3.0, Sora 2 Pro
Suno v4.5 Plus, MusicGen, Mureka, ElevenLabs voices
No negotiation. The tier recomputes every six hours against your rolling 30-day spend on successful calls.
| Tier | 30-day spend | Markup over upstream |
|---|---|---|
| Starter | $0 | 40% |
| Growth | $50/mo | 30% |
| Scale | $200/mo | 22% |
| Enterprise | $1,000/mo | 15% |
| Strategic | $5,000/mo | 10% |
When OpenAI returns 5xx for chat we retry on Anthropic or Gemini through OpenRouter — same request, alternate provider — so a single upstream incident does not reach your users. Spend caps stay enforced through the retry path.
For video, image and music we ship an async API with HMAC-signed webhook delivery and exponential-backoff retries. A 30-second Veo render becomes one POST and one verified callback — no client-side polling required.
Per-key and per-account caps are enforced before the upstream call: a runaway loop or compromised key surfaces HTTP 402 instead of draining your balance. Aggregated team caps work the same way across shared workspaces.
Upstream 5xx, internal errors and timeouts are not billed. Only successful 2xx responses count against credits. The full pricing math is documented in the Terms; refund policy applies to unspent credits within 30 days of purchase.
AI Generate accepts the same Authorization: Bearer header you already send to OpenAI, the same {model, messages, stream, tools} JSON, and returns the same {id, choices, usage} shape. Streaming SSE chunks ship in the canonical data: {choices:[{delta:{content}}]} format that the OpenAI Python and JavaScript SDKs already consume. The only thing you change is base_url. Every model name is a string — pass claude-sonnet-4-5 and we route to Anthropic, gpt-5-codex and we route to OpenAI direct, gemini-3-pro and we route to Google, and any other model name in the catalog fans out through OpenRouter. There is no per-provider SDK to learn, no per-account billing to consolidate, no per-modality auth to manage. LangChain, llama-index, the Vercel AI SDK and any framework that already speaks OpenAI also work without modification — the wire format is what they target, and the wire format is what we serve.
Most production AI stacks talk to three to five providers — OpenAI for code, Anthropic for analysis, Google for long context, fal or Replicate for image, Runway for video. Each is a separate billing relationship: separate invoices, separate spend caps, separate dashboards, separate Stripe webhooks if you bill customers downstream. AI Generate folds all of them behind one credit pool with one API key. Your finance team gets one line item, your operators get one dashboard, your engineers get one SDK, and your error-handling code stops branching on which provider died this morning. The 10–40% markup pays for the consolidation; the five-tier auto-discount makes the math work above $200 a month. At the strategic tier the unit price is competitive with direct-from-OpenAI billing once you account for the operational cost of running three to five separate provider relationships in parallel.
If your codebase imports the OpenAI Python or JavaScript package, the migration is a constructor-argument change. Set base_url (Python) or baseURL (JavaScript) to https://aimarcusimage.eu/api/v1 and your AI Generate key in api_key. Every chat-completions call works without further changes — including the tool_calls block, the response_format JSON mode, the vision content[], the streaming async iterator and the standard usage object. The OpenAI client retries, timeouts and request-id headers behave the same way. For embeddings, point the same client at /v1/embeddings — five embedding models including text-embedding-3-large. For generative media the surface diverges from OpenAI (because OpenAI does not ship video or music as a single API): you POST to /v1/jobs/createTask, receive a taskId, and either poll /v1/jobs/recordInfo or wait for our webhook callback. Auth and credit pool stay shared across all surfaces.
Markup over upstream cost starts at 40% and drops automatically as your rolling 30-day spend climbs: 30% at $50, 22% at $200, 15% at $1,000, 10% at $5,000. The tier recomputes every six hours against the past 30 days of successful, billed calls — so a busy launch month moves you up, a quiet month does not punish you with a sudden cliff. There is no negotiation, no sales call and no contract. The current tier is visible on every /v1/me response and in your dashboard. Failed calls (upstream 5xx, internal errors, client 4xx) are excluded from the tier calculation: only the spend that would actually appear on an OpenAI invoice counts. At the strategic tier the math frequently beats per-account direct billing once you fold in the cost of running three to five separate provider relationships, each with its own SLO, its own retry budget and its own finance integration.
Sign up, verify your email, and $10 in credits land in your account — enough to run roughly 1.6 million Claude Haiku tokens or 600 Nano Banana images.