OPENAI-COMPATIBLE API

The OpenAI-compatible API for every model that matters.

Name: AI Generate — OpenAI-Compatible API
Brand: AI Generate
Availability: InStock

Three lines change. The OpenAI Python and JavaScript SDKs work as-is. Pick Claude, Gemini, GPT-5 or any of 37 chat models — same /v1/chat/completions, streaming SSE, tool calls, vision input, JSON mode. Plus video, image and music endpoints — one credit pool, one Bearer token, one dashboard.

Get API key — $10 free See the code

No card required · $10 in credits unlocked the moment you verify your email

30-second migration

Three lines change. The rest of your code stays.

If your codebase imports the OpenAI Python or JavaScript package, the migration is a constructor-argument change. Everything downstream — tool calls, streaming iterators, vision content blocks, JSON mode, embeddings — keeps working without modification.

Set base_url

−https://api.openai.com/v1 +https://aimarcusimage.eu/api/v1
Set api_key

−sk-... +sk-aig-...
Pick any model in the catalog

−gpt-4o +claude-sonnet-4-5 · gemini-3-pro · gpt-5-codex · any of 37

API call

Pick a language. Ship in under a minute.

curl --request POST

curl https://aimarcusimage.eu/api/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      { "role": "user", "content": "Summarize streaming SSE in 3 lines." }
    ]
  }'

main.py — drop-in OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="sk-aig-...",
    base_url="https://aimarcusimage.eu/api/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",          # any of 37 priced chat models
    messages=[{"role": "user", "content": "Hello."}],
)
print(response.choices[0].message.content)

index.mjs — Node 18+, openai package

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-aig-...",
  baseURL: "https://aimarcusimage.eu/api/v1",
});

const r = await client.chat.completions.create({
  model: "gpt-5-codex",
  messages: [{ role: "user", content: "Say hi." }],
});
console.log(r.choices[0].message.content);

stream.py — async iterator, SSE

from openai import OpenAI

client = OpenAI(
    api_key="sk-aig-...",
    base_url="https://aimarcusimage.eu/api/v1",
)

stream = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{"role": "user", "content": "Stream a haiku."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Compatibility surface

What works as a true drop-in.

Every row tested against the live OpenAI Python and JavaScript SDKs.

Feature	Status	Notes
/v1/chat/completions	Yes	Standard request and response shape, including tool_calls and finish_reason.
Streaming SSE (stream=true)	Yes	OpenAI SDK iterator works as-is; canonical data: chunks shipped server-sent.
Tool / function calling	Yes	Pass tools[] and tool_choice — handled per-provider, results returned in tool_calls.
Vision input (image_url)	Yes	GPT-5, Claude 4.x, Gemini 3 Pro, Llama Vision — same content[] shape.
JSON mode (response_format)	Yes	Works on GPT-5 family, Claude 4.x and Gemini 3 Pro out of the box.
Embeddings /v1/embeddings	Yes	5 embedding models including text-embedding-3-large.
Image / video / music	Async	Different surface — POST /v1/jobs/createTask + webhook on completion.
Per-key spend cap (HTTP 402)	Yes	Enforced before the upstream call; never silent overspend.
Assistants / completions (legacy)	No	Not implemented — both APIs are deprecated upstream.

One Bearer token

307 models, four modalities, one credit pool.

Chat

GPT-5.5, Claude Opus 4.6, Gemini 3 Pro, DeepSeek v3, Mistral Large, GPT-5-Codex

Image

Flux Kontext Pro, Nano Banana Pro, GPT-Image, Ideogram v3, Recraft, Qwen Image

177

Video

Veo 3.1, Runway Aleph, Seedance 2, Wan 2.7, Kling 3.0, Sora 2 Pro

Music & voice

Suno v4.5 Plus, MusicGen, Mureka, ElevenLabs voices

Auto-volume discount

Markup drops as your spend climbs.

No negotiation. The tier recomputes every six hours against your rolling 30-day spend on successful calls.

Tier	30-day spend	Markup over upstream
Starter	$0	40%
Growth	$50/mo	30%
Scale	$200/mo	22%
Enterprise	$1,000/mo	15%
Strategic	$5,000/mo	10%

Built-in reliability

What you get on top of the OpenAI shape.

Outage hedging

When OpenAI returns 5xx for chat we retry on Anthropic or Gemini through OpenRouter — same request, alternate provider — so a single upstream incident does not reach your users. Spend caps stay enforced through the retry path.

Async + webhooks

For video, image and music we ship an async API with HMAC-signed webhook delivery and exponential-backoff retries. A 30-second Veo render becomes one POST and one verified callback — no client-side polling required.

Daily and monthly caps

Per-key and per-account caps are enforced before the upstream call: a runaway loop or compromised key surfaces HTTP 402 instead of draining your balance. Aggregated team caps work the same way across shared workspaces.

No charge for failed calls

Upstream 5xx, internal errors and timeouts are not billed. Only successful 2xx responses count against credits. The full pricing math is documented in the Terms; refund policy applies to unspent credits within 30 days of purchase.

Architecture, in depth

How the gateway holds up under real traffic.

What "OpenAI-compatible" actually means here

AI Generate accepts the same Authorization: Bearer header you already send to OpenAI, the same {model, messages, stream, tools} JSON, and returns the same {id, choices, usage} shape. Streaming SSE chunks ship in the canonical data: {choices:[{delta:{content}}]} format that the OpenAI Python and JavaScript SDKs already consume. The only thing you change is base_url. Every model name is a string — pass claude-sonnet-4-5 and we route to Anthropic, gpt-5-codex and we route to OpenAI direct, gemini-3-pro and we route to Google, and any other model name in the catalog fans out through OpenRouter. There is no per-provider SDK to learn, no per-account billing to consolidate, no per-modality auth to manage. LangChain, llama-index, the Vercel AI SDK and any framework that already speaks OpenAI also work without modification — the wire format is what they target, and the wire format is what we serve.

Why a single Bearer token saves more than a few keystrokes

Most production AI stacks talk to three to five providers — OpenAI for code, Anthropic for analysis, Google for long context, fal or Replicate for image, Runway for video. Each is a separate billing relationship: separate invoices, separate spend caps, separate dashboards, separate Stripe webhooks if you bill customers downstream. AI Generate folds all of them behind one credit pool with one API key. Your finance team gets one line item, your operators get one dashboard, your engineers get one SDK, and your error-handling code stops branching on which provider died this morning. The 10–40% markup pays for the consolidation; the five-tier auto-discount makes the math work above $200 a month. At the strategic tier the unit price is competitive with direct-from-OpenAI billing once you account for the operational cost of running three to five separate provider relationships in parallel.

Drop-in migration with no SDK rewrite

If your codebase imports the OpenAI Python or JavaScript package, the migration is a constructor-argument change. Set base_url (Python) or baseURL (JavaScript) to https://aimarcusimage.eu/api/v1 and your AI Generate key in api_key. Every chat-completions call works without further changes — including the tool_calls block, the response_format JSON mode, the vision content[], the streaming async iterator and the standard usage object. The OpenAI client retries, timeouts and request-id headers behave the same way. For embeddings, point the same client at /v1/embeddings — five embedding models including text-embedding-3-large. For generative media the surface diverges from OpenAI (because OpenAI does not ship video or music as a single API): you POST to /v1/jobs/createTask, receive a taskId, and either poll /v1/jobs/recordInfo or wait for our webhook callback. Auth and credit pool stay shared across all surfaces.

How the volume tier ladder actually applies

Markup over upstream cost starts at 40% and drops automatically as your rolling 30-day spend climbs: 30% at $50, 22% at $200, 15% at $1,000, 10% at $5,000. The tier recomputes every six hours against the past 30 days of successful, billed calls — so a busy launch month moves you up, a quiet month does not punish you with a sudden cliff. There is no negotiation, no sales call and no contract. The current tier is visible on every /v1/me response and in your dashboard. Failed calls (upstream 5xx, internal errors, client 4xx) are excluded from the tier calculation: only the spend that would actually appear on an OpenAI invoice counts. At the strategic tier the math frequently beats per-account direct billing once you fold in the cost of running three to five separate provider relationships, each with its own SLO, its own retry budget and its own finance integration.

Frequently asked

Is this a true drop-in for the OpenAI SDK?

Yes. Change base_url to https://aimarcusimage.eu/api/v1 and use a sk-aig-… key. The Python and JavaScript packages from OpenAI work without modification: tool calling, streaming, vision, JSON mode and embeddings. The only divergence is async generative media (image, video, music), which uses /v1/jobs/createTask with webhook callbacks because OpenAI does not ship a video API.

Which OpenAI features are supported?

Chat completions, streaming SSE, tool / function calling, vision input (image_url content blocks), JSON-mode response_format, embeddings, and the standard usage object. The legacy /v1/completions endpoint and the Assistants API are not supported — neither is recommended for new code.

How is billing different from calling OpenAI directly?

Pre-paid credits instead of post-pay invoices. You purchase $10, $50 or $200 packages (5–12% bonus credits) and consume them across all 307 models. The five-tier ladder drops the markup from 40% to 10% on rolling 30-day spend. At the strategic tier the unit price is competitive with direct OpenAI billing once you account for three to five separate provider accounts.

Can I use Claude or Gemini through the same endpoint?

Yes. Pass model="claude-sonnet-4-5", "claude-opus-4-6", "gemini-3-pro", "gpt-5", "gpt-5-codex" or any of 37 priced chat models in the standard messages payload. Routing to Anthropic, Google, OpenAI direct or OpenRouter is automatic — your code keeps the OpenAI SDK shape regardless of which provider serves the request.

What happens during an OpenAI outage?

For chat, we retry on Anthropic or Gemini through OpenRouter when an upstream returns 5xx. The retry preserves your messages, model and tools — only the provider changes. For async tasks the webhook delivery layer retries with exponential backoff, so a transient upstream failure does not surface to your end users.

How do per-key and per-account spend caps work?

Set a daily and / or monthly cap on every API key in the dashboard. The check runs before the upstream call: if the limit is hit, we return HTTP 402 immediately rather than completing the call and burning credits. Aggregated team caps work the same way across keys in a shared workspace.

Are failed calls billed?

No. Upstream 5xx responses, internal errors and timeouts are not charged against credits. Only successful 2xx responses count. Refund policy on unspent credits is documented in the Terms — within 30 days of purchase, no card-payment dispute open.

Is there a free trial?

Yes. Sign up, verify your email, and $10 in credits land in your account — enough to run roughly 1.6 million Claude Haiku tokens, 600 Nano Banana images or 80 Flux Kontext Pro renders. No card required to claim it.

Last updated 2026-04-29.

Ship the OpenAI-compatible call. Get every other model for free.

Sign up, verify your email, and $10 in credits land in your account — enough to run roughly 1.6 million Claude Haiku tokens or 600 Nano Banana images.

Start free Read the docs