Rate Limiting

Overview

dialektai implements rate limiting to ensure fair usage and system stability. This page explains the rate limits, how they work, and best practices for handling them.

Understanding Rate Limit Types

dialektai enforces rate limits at three levels:

  1. Per-Minute Limits - Burst protection to prevent short-term spikes
  2. Per-Hour Limits - Sustained usage protection
  3. Monthly Quotas - Total query budget for the billing period

Rate limits are tracked separately for two types of access:

All limits are enforced per organization, meaning all API keys and users within your organization share the same quota.

Rate Limit Tiers

Free Trial (14 days)

Starter Plan

Professional Plan

Business Plan

Enterprise Plan

How Rate Limiting Works

Sliding Window

dialektai uses a sliding window algorithm:

Window: 60 seconds
Limit: 100 requests

Time: 0s - Request 1
Time: 1s - Request 2
...
Time: 59s - Request 100
Time: 60s - Request 101 ✅ (Request 1 dropped from window)
Time: 61s - Request 102 ✅ (Request 2 dropped from window)

Per-Organization

Rate limits are enforced per organization, not per API key:

# Organization A (Starter plan, API requests)
API Key 1: 30 requests/min
API Key 2: 30 requests/min
Total: 60 requests/min (within 60/min limit) ✅

# Organization A
API Key 1: 50 requests/min
API Key 2: 20 requests/min
Total: 70 requests/min (exceeds 60/min limit) ❌

Rate Limit Headers

Every API response includes rate limit information:

Response Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1697624400
X-RateLimit-Window: 60

Header Descriptions:

Example Usage

import requests
from datetime import datetime

response = requests.post(url, headers=headers)

limit = int(response.headers.get('X-RateLimit-Limit'))
remaining = int(response.headers.get('X-RateLimit-Remaining'))
reset = int(response.headers.get('X-RateLimit-Reset'))

print(f"Rate limit: {remaining}/{limit}")
print(f"Resets at: {datetime.fromtimestamp(reset)}")

if remaining < 10:
    print("⚠️ Approaching rate limit!")

Handling Rate Limits

429 Too Many Requests

When rate limit is exceeded, API returns:

{
  "detail": "Rate limit exceeded",
  "retry_after": 45,
  "limit": 100,
  "window": "1 minute"
}

Response Headers:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1697624445

Retry Logic

Basic Retry

import time
import requests

def make_request():
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        print(f"Rate limited. Waiting {retry_after}s...")
        time.sleep(retry_after)
        return make_request()  # Retry

    return response.json()

Exponential Backoff

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    retry=retry_if_exception_type(requests.exceptions.HTTPError)
)
def make_request_with_backoff():
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 429:
        raise requests.exceptions.HTTPError(response=response)

    response.raise_for_status()
    return response.json()

Node.js Example

const axios = require('axios');

async function makeRequestWithRetry(maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await axios.post(url, data, { headers });
      return response.data;
    } catch (error) {
      if (error.response?.status === 429) {
        const retryAfter = parseInt(error.response.headers['retry-after'] || '60');
        console.log(`Rate limited. Waiting ${retryAfter}s...`);
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        continue;
      }
      throw error;
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limit Bypass Strategies

1. Caching

Cache responses to reduce API calls:

from functools import lru_cache
import time

@lru_cache(maxsize=1000)
def cached_query(query_hash, ttl_hash):
    # ttl_hash changes every hour, invalidating cache
    return make_api_request(query_hash)

# Usage
ttl_hash = int(time.time() // 3600)  # Changes every hour
result = cached_query(query, ttl_hash)

2. Request Batching

Batch multiple queries into single requests:

# ❌ Multiple requests (uses 3 quota)
result1 = query("Show me customers")
result2 = query("Show me orders")
result3 = query("Show me products")

# ✅ Single batched request (uses 1 quota)
results = batch_query([
    "Show me customers",
    "Show me orders",
    "Show me products"
])

3. Request Queueing

Queue requests and process within rate limits:

import queue
import threading
import time

class RateLimitedQueue:
    def __init__(self, requests_per_minute):
        self.queue = queue.Queue()
        self.requests_per_minute = requests_per_minute
        self.interval = 60 / requests_per_minute

    def add(self, request):
        self.queue.put(request)

    def process(self):
        while True:
            if not self.queue.empty():
                request = self.queue.get()
                response = make_api_request(request)
                time.sleep(self.interval)
            else:
                time.sleep(1)

# Usage for API requests (Starter plan: 60/min)
queue = RateLimitedQueue(requests_per_minute=60)
queue.add({"message": "Show me customers"})
threading.Thread(target=queue.process, daemon=True).start()

Upgrading Your Plan

To upgrade your plan or view pricing details, visit the billing section in your portal:

Portal: https://app.dialektai.com/billing

You can also view our pricing on our website:

Pricing Page: https://dialektai.com/pricing

From the billing page, you can:

Enterprise Custom Limits

Enterprise and Custom plan customers can request tailored rate limits based on their needs:

Contact Sales: [email protected]

Custom Options:

Best Practices

1. Implement Backoff

Always implement retry logic with exponential backoff when you receive a 429 response:

# Don't immediately retry on 429
if response.status_code == 429:
    retry_after = int(response.headers.get('Retry-After'))
    time.sleep(retry_after)  # Wait before retrying

2. Cache Responses

Cache frequently accessed data to reduce API calls:

# Cache frequently used queries
@cache(ttl=3600)
def get_dashboard_data():
    return make_api_request("Show me dashboard metrics")

3. Use Webhooks

For background jobs, use webhooks instead of polling to avoid wasting quota:

# ❌ Polling (wastes quota)
while True:
    status = check_job_status(job_id)
    if status == "completed":
        break
    time.sleep(5)

# ✅ Webhook (no quota usage)
@app.post("/webhook/job-complete")
def job_complete(job_id):
    process_job_result(job_id)

Troubleshooting

Issue: Constantly hitting rate limits

Solutions:

  1. Implement caching
  2. Batch requests where possible
  3. Upgrade plan
  4. Optimize query frequency

Issue: Different API keys have different limits

Explanation: Rate limits are per-organization, not per API key. All API keys in the same organization share the same quota.

Solution: Create separate organizations for different rate limit pools.

Next Steps