Guide / Rate Limiting / Cost Control

Rate Limiting for Lovable Apps

AI-built apps ship with no rate limiting on their AI endpoints. One user with a loop script can burn through your entire OpenAI budget overnight. Here's how to check, fix, and verify rate limiting on every endpoint that costs money to call.

The short version. Every AI endpoint in your app (chat, generate, summarize, draft) makes a paid API call when invoked. Without rate limiting, anyone who finds the endpoint can call it in a loop. The bill scales linearly with the call volume, and modern providers (OpenAI, Anthropic) charge in real time against your API key. Stories of $1,000-$5,000 surprise bills from a single abusive user are common. The fix is putting a rate limiter in front of every endpoint that costs money.
In this guide
  1. Why AI endpoints need rate limiting specifically
  2. The specific failure pattern in Lovable apps
  3. How to check if your endpoints are rate limited
  4. How to add rate limiting (Upstash + Vercel)
  5. Layered limits: per-IP, per-user, per-cost
  6. Common pitfalls when adding rate limiting
  7. Related cost-control issues to check

Why AI endpoints need rate limiting specifically

Rate limiting is the practice of restricting how many requests a given client can make to your API within a time window. For most web endpoints, the consequence of skipping rate limiting is performance (your server gets overloaded). For AI endpoints, the consequence is direct financial loss.

Each call to your AI endpoint triggers a downstream API call to OpenAI, Anthropic, or whichever provider you use. That provider charges you per token consumed. If your app exposes an endpoint that calls GPT-4 with a 4,000-token prompt and gets a 1,000-token response, every invocation costs you real money. At GPT-4 pricing, a single call might be $0.05-$0.20. Sounds tiny. Multiplied by 50,000 abusive calls in an hour, you're looking at a four-figure bill before you wake up.

The financial damage is also usually irreversible. By the time you notice the spike in your provider dashboard, the charges have already cleared. Some providers offer credit refunds for clear abuse cases, but it's not guaranteed, and the process is slow. The defensive posture has to be: assume any unprotected endpoint will eventually be abused.

The specific failure pattern in Lovable apps

The typical AI endpoint generated by Lovable, Bolt, or V0 looks something like this:

// What AI builders typically generate
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export default async function handler(req, res) {
  const { prompt } = req.body;

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
  });

  res.status(200).json({ result: response.choices[0].message.content });
}

Three problems:

  1. No authentication check. Anyone, including non-logged-in users, can hit this endpoint.
  2. No rate limiting. A single client can hit this in a loop. There's no per-IP, per-session, or per-user throttle.
  3. No input validation on token cost. The user can send a 100,000-token prompt and you pay for it. There's no cap on the size of prompt.

The attack is trivial. A script:

while true; do
  curl -X POST https://yourapp.com/api/chat \
    -H "Content-Type: application/json" \
    -d '{"prompt": "Write a 1000-word essay on rate limiting"}'
done

Run that overnight from a single VM and the bill the next morning is bad. Run it from a few hundred residential proxies and the bill is catastrophic.

How to check if your endpoints are rate limited

Three checks.

Check 1: Read the code

Open your AI endpoint files. Search for any of these patterns:

If none of those appear in your AI endpoint code, the endpoint is not rate-limited.

Check 2: Hammer your own endpoint

Hit your endpoint 50-100 times in quick succession from a script. If every request returns 200 OK, you have no rate limiting. A correctly rate-limited endpoint should start returning 429 after some threshold (often 5-10 requests per minute for AI endpoints).

// Run this against your own endpoint
for i in {1..50}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://yourapp.com/api/chat \
    -H "Content-Type: application/json" \
    -d '{"prompt": "test"}'
done

If you see 50 lines of 200, rate limiting is missing. If you see some 200s followed by 429s, rate limiting is working.

Check 3: Look at your provider dashboard

Open your OpenAI/Anthropic usage dashboard. Look at the request count over the last 7 days. Does the volume match your actual user activity? If you have 50 users but you're seeing 5,000 requests per day, something or someone is hitting your endpoint outside of normal user behavior.

How to add rate limiting (Upstash + Vercel)

The standard solution for Vercel-hosted apps is Upstash Ratelimit. It's a serverless rate limiter backed by Upstash Redis. Free tier handles 10,000 commands per day, which is plenty for most starting apps.

Step 1: Create an Upstash Redis database

Go to upstash.com, sign up free, create a Redis database in the same region as your Vercel deployment. Copy the REST URL and REST token.

Step 2: Add the dependencies

npm install @upstash/ratelimit @upstash/redis

Step 3: Add environment variables to Vercel

UPSTASH_REDIS_REST_URL=https://your-db.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token-here

Step 4: Wrap your AI endpoint with rate limiting

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import OpenAI from 'openai';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, '1 m'),  // 5 requests per minute
});

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export default async function handler(req, res) {
  // Identify the requester (IP for unauthenticated, user_id for logged-in)
  const identifier = req.headers['x-forwarded-for']?.split(',')[0]
                  ?? req.socket.remoteAddress
                  ?? 'anonymous';

  const { success, limit, remaining, reset } = await ratelimit.limit(identifier);

  if (!success) {
    res.setHeader('X-RateLimit-Limit', limit);
    res.setHeader('X-RateLimit-Remaining', remaining);
    res.setHeader('X-RateLimit-Reset', reset);
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }

  // Validate the prompt size
  const { prompt } = req.body;
  if (!prompt || prompt.length > 2000) {
    return res.status(400).json({ error: 'Invalid prompt' });
  }

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    max_tokens: 1000,  // cap the response size
  });

  res.status(200).json({ result: response.choices[0].message.content });
}

What this does: each incoming request is keyed by the requester's IP. Upstash counts the requests in a sliding 1-minute window. If a single IP exceeds 5 requests in that window, the next request returns 429 instead of calling OpenAI. The counter resets continuously, so legitimate users aren't permanently blocked.

The added prompt.length check and max_tokens cap also prevent the "send a 100,000-token prompt" attack. Even if a user passes the rate limit, the maximum cost per call is bounded.

Layered limits: per-IP, per-user, per-cost

One rate limit usually isn't enough. The real-world approach is to layer multiple limits:

Layer 1: Per-IP

The cheapest defense. Limits anonymous abuse. Easy to bypass with proxies, but raises the cost of an attack significantly. Use the sliding window from the example above.

Layer 2: Per-authenticated-user

If your endpoint requires login, also rate-limit by user ID. This prevents a single legitimate user from accidentally (or intentionally) burning your budget. A common pattern: 100 requests per day per user on a free tier.

const userRatelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.fixedWindow(100, '1 d'),  // 100 per day per user
});

// In your handler, after auth check:
const userResult = await userRatelimit.limit(`user:${session.userId}`);
if (!userResult.success) {
  return res.status(429).json({ error: 'Daily limit reached' });
}

Layer 3: Global cost cap

The most defensive layer: a daily total budget across all users and IPs. If your app should cost no more than $50/day in OpenAI calls, set a counter that increments per call and cuts off everyone if the day's spend exceeds the cap.

const globalRatelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.fixedWindow(2000, '1 d'),  // 2000 total calls/day
});

const globalResult = await globalRatelimit.limit('global');
if (!globalResult.success) {
  return res.status(503).json({ error: 'Service temporarily at capacity' });
}

Returning 503 instead of 429 here is intentional: the user isn't being rate-limited, the service is at capacity. The behavior is the same (request rejected) but the framing makes it easier to explain to a confused user. You can also wire this up to send you an email so you know when the cap is hit.

Common pitfalls when adding rate limiting

Pitfall 1: Identifying users by an unreliable header

Some apps use req.connection.remoteAddress for IP-based limiting. On Vercel and most modern hosts, every request comes from the platform's IP, not the user's. Always use x-forwarded-for (and take the first IP in the comma-separated list — that's the original client). Same for any reverse proxy setup.

Pitfall 2: In-memory rate limiting on serverless

If you use an in-memory counter (a regular JavaScript object) in a Vercel serverless function, the counter resets on every cold start. Different function instances also have different counters. Effective rate limiting on serverless requires external state (Redis, KV, Postgres). Upstash is the standard choice because it's serverless-native.

Pitfall 3: Returning 200 with an error in the body

Some AI-generated handlers return 200 with { error: 'rate limited' } in the JSON. Don't do this. Rate limit violations should return HTTP 429. Monitoring tools, browser caches, and other clients all respect the 429 status code. A 200 with an error in the body looks like success to most systems and breaks proper handling.

Pitfall 4: No retry-after guidance

When you return 429, include a Retry-After header (or X-RateLimit-Reset) so clients know when they can try again. Without this, retry logic often defaults to "hammer the endpoint until something works," which defeats the rate limit.

Pitfall 5: Rate limiting after the expensive call

The rate limit check must happen before the OpenAI/Anthropic call. Some incorrect implementations check the rate limit, call the AI, then update the counter after the response. That race condition allows burst attacks during the window. Check first, then call, in that order.

Rate limiting fixes the worst-case scenario, but other cost-control issues compound it.

No prompt size cap

Even with rate limiting, a single call with a 50,000-token prompt is expensive. Always cap prompt.length at something reasonable (2,000-4,000 characters for most chat apps) and reject anything larger with a 400.

No max_tokens on the response

Without max_tokens in your OpenAI call, the response can run as long as the model wants. Caps the response cost. Set max_tokens: 1000 or whatever fits your use case.

No alerting on usage spikes

Set up an alert in your OpenAI/Anthropic dashboard to email you if daily usage exceeds a threshold. Even with rate limiting, you want to know if something unexpected is happening. Most providers have a free usage limit feature in their dashboard.

Streaming endpoints with no cancellation

If you use streaming responses, make sure your handler cancels the upstream OpenAI request when the client disconnects. Otherwise a malicious client opens 1,000 streaming connections, disconnects each one, and you keep paying for the generation to finish.

The TL;DR

Want this done for you?

If you'd rather not stand up an Upstash account and wire layered rate limits across every AI endpoint, Rivetz audits and hardens Lovable apps for production. The audit ($1,000, 3 business days) finds every issue. The cleanup ($3,500, 14 business days) implements every CRITICAL and HIGH severity fix including rate limiting, RLS, secrets, and webhook security.

100% async. No calls. No scope creep. Fixed price.

See offers See full checklist