AI MODEL MARKETPLACE

The marketplace for every model you actually ship.

Browse 307 curated SOTA models across video, image, music and chat. Test any of them in the playground without writing code. Pay per unit (per image, per second of video, per million tokens) from one credit pool — no per-provider account, no negotiated contract, no wasted setup time.

No card required · Test any model in the playground before you build

The thesis

Curated, not crowded.

Hugging Face lists 45,000 model cards. Most are forks, dead branches or research artefacts that never reach production. AI Generate lists 307 — every one is a current SOTA model from a named provider, priced per unit, latency-tested, and shipped behind a Bearer-token API. The catalog is small on purpose; everything in it is meant to be deployed.

What's in the catalog

Four modalities, one credit pool.

Why a curated marketplace

Four things you won't get from a raw aggregator.

Transparent per-unit pricing

Every model has a published unit price (per image, per second of video, per million tokens). No "contact sales", no surprise per-second compute bills. The full pricing table is on /pricing and on every model page.

One credit pool across modalities

Buy $10, $50 or $200 of credits and spend them on whatever you need this week. Generate a Suno song in the morning, a Veo clip in the afternoon, a Claude analysis in the evening — all from the same balance.

Test in the playground, no code required

Every chat, image, video and music model has a UI runner at /playground. Tweak prompts, switch models, copy the resulting curl command into your code. Onboarding stops at "log in".

Auto volume discount, 5 tiers

Markup over upstream cost drops from 40% to 10% as your rolling 30-day spend climbs. Recomputed every six hours. No negotiation, no contract, no minimum commit.

How it compares

How AI Generate compares to other marketplaces.

Three categories, each with a different trade-off. The choice depends on whether you want curation, breadth or vertical depth.

  AI Generate Hugging Face Replicate Direct providers
Catalog size 307 curated SOTA 45,000+ (mostly forks) ~thousands community 1 provider per account
Pricing surface Per-unit, published Per-token via Inference API Per-second of GPU compute Per-token, per-provider invoice
Modalities under one key Chat + image + video + music Chat + image (limited video) Image + video + ML, limited chat Single modality per provider
Try without code Yes — playground for every model Yes — Spaces (per model UI varies) Yes — model page demos No (or per-provider Playground)
SDK story OpenAI-compatible (drop-in) huggingface_hub (proprietary) replicate-python (proprietary) OpenAI / Anthropic / Google direct
Volume discount Auto, 5 tiers, no negotiation No public tier ladder No public tier ladder Negotiated separately
Spend caps Daily + monthly, returns HTTP 402 Per-key budget Per-key spend limits Per-account billing alerts
In depth

What "marketplace" actually means here.

What "marketplace" actually means in 2026

The word marketplace is overloaded. Hugging Face is technically the largest, with 45,000+ model cards uploaded by anyone — research labs, forks, fine-tunes, community experiments. The signal-to-noise ratio is great if you are a researcher and bad if you are shipping a SaaS. Replicate is a model marketplace in a different sense: it gives you a per-second GPU runtime to host any model someone has packaged in Cog. AI Generate is a marketplace in the third sense: a curated catalog of 307 production-grade models, priced per unit, behind a single Bearer token, with a playground UI for testing and a webhook surface for async generation. Every model in our catalog has a named upstream provider (OpenAI, Anthropic, Google, Black Forest Labs, ByteDance, Suno, ElevenLabs, fal, Replicate-hosted) and a per-unit price you can quote your own customers from.

Why one credit pool changes the build vs buy maths

When your stack reaches the third or fourth provider, the operational tax of running them in parallel becomes the dominant cost. You have separate invoices to reconcile, separate spend caps to monitor, separate dashboards to log into, separate Stripe webhooks to verify if you bill customers downstream. AI Generate folds all of that into one credit pool with one API key. Your finance team gets one line item; your engineers stop branching error-handling on which provider died this morning; your operators see the same dashboard whether the call routed to Claude, Gemini or Veo. The 10–40% markup pays for the consolidation, and the auto-tier discount makes the unit-price math competitive once your spend crosses ~$200 a month.

Browse, test, ship — what a model page actually does

Every model in the marketplace has a dedicated page at /model/<slug>. The page shows the per-unit price (in credits and USD), the average latency, the supported parameters, a sample request, and a one-click "open in playground" button. The playground at /playground runs the same call you will run from your backend — same model, same prompt format, same output. You can copy the resulting curl request straight into your code, or switch the SDK tab between Python and JavaScript. Onboarding becomes: log in, claim $10, open a model page, hit playground, copy curl, ship. The interactive surface is why developers pick a marketplace over wiring three to five provider SDKs into their own backend before they have shipped anything.

How curation changes when a new SOTA model lands

When a new flagship model ships — Veo 3.2, Claude Opus 4.7, GPT-5.6, the next Suno — we benchmark it against the current category leader on latency, output quality and unit price, and add it to the catalog within days. Old or replaced models stay listed for backward compatibility (so existing integrations do not break) but are marked "superseded" on the model page. The curation policy is explicit: every active model in the catalog is one we would recommend to a customer building today. There is no SEO-padding with abandoned forks or weak rebrands. Coverage breadth is a side effect of provider quality, not a vanity metric.

Frequently asked

How is this different from Hugging Face or Replicate?
Hugging Face hosts 45,000+ models — most are forks, fine-tunes or research artefacts. Replicate gives you per-second GPU runtime to host any Cog-packaged model. AI Generate is a curated catalog of 307 production-grade SOTA models from named upstream providers, priced per unit, behind a single OpenAI-compatible Bearer token. Catalog size is small on purpose — every entry is a model we would recommend for production.
Can I try a model before paying for it?
Yes. Sign up, verify your email, and $10 in credits land in your account. Open any model page from the marketplace, click "playground", and run the model with your own prompt. Onboarding stops at "log in" — no card on file required.
How is pricing structured?
Per unit, published in advance. Image models price per image, video models per second of output, music models per generated track, chat models per million tokens (input and output separately). Prices sync daily from upstream providers, with a 40% starter markup that drops to 10% on rolling 30-day spend above $5,000.
Which categories are covered?
307 models across four categories: 37 chat (GPT-5, Claude, Gemini, DeepSeek, Mistral), 72 image (Flux 2 Pro, Nano Banana Pro, Ideogram v3, GPT-Image, Recraft), 177 video (Veo 3.1, Runway Aleph, Seedance 2, Wan 2.7, Kling 3.0, Sora 2 Pro), 21 music & voice (Suno v4.5 Plus, MusicGen, Mureka, ElevenLabs voices).
Do you ship a no-code UI for testing?
Yes — the playground at /playground supports every chat, image, video and music model. Tweak the prompt, switch models from a dropdown, copy the resulting curl call straight into your code. The same API call powers the playground and the public API, so behaviour is identical between the two.
How do volume discounts work?
Markup over upstream cost is auto-applied based on rolling 30-day spend. The five tiers are 40% (starter), 30% ($50/mo), 22% ($200/mo), 15% ($1,000/mo) and 10% ($5,000/mo). The tier recomputes every six hours; current tier is visible on every /v1/me response and in your dashboard.
What happens when a new SOTA model launches?
We benchmark new flagship models on latency, output quality and unit price, then add them to the marketplace within days. Older models stay listed (so existing integrations do not break) and are marked "superseded" on their model page when a strictly-better replacement ships.
Is the API OpenAI-compatible?
Yes for chat (37 models via /v1/chat/completions, drop-in OpenAI SDK swap, streaming SSE, tool calls). For image, video and music we ship an async surface (/v1/jobs/createTask + webhook callback) since OpenAI does not publish a video or music API. Same Bearer token across all of them.

Catalog last updated .

Open the marketplace, pick a model, ship in a minute.

Sign up, verify your email, and $10 in credits land in your account — enough to test 80 Flux Kontext Pro renders or 600 Nano Banana images.