Rate Limiting

Overview

dialektai implements rate limiting to ensure fair usage and system stability. This page explains the rate limits, how they work, and best practices for handling them.

Understanding Rate Limit Types

dialektai enforces rate limits at three levels:

Per-Minute Limits - Burst protection to prevent short-term spikes
Per-Hour Limits - Sustained usage protection
Monthly Quotas - Total query budget for the billing period

Rate limits are tracked separately for two types of access:

API Requests - Programmatic access via API keys (e.g., integrations, custom UIs)
Chat Requests - Interactive queries via the web interface or widget

All limits are enforced per organization, meaning all API keys and users within your organization share the same quota.

Rate Limit Tiers

Free Trial (14 days)

API Requests: 30/minute, 500/hour
Chat Requests: 5/minute, 50/hour
Monthly Queries: 150 total queries
Duration: 14 days

Starter Plan

API Requests: 60/minute, 1,000/hour
Chat Requests: 10/minute, 100/hour
Monthly Queries: 500 total queries

Professional Plan

API Requests: 120/minute, 3,000/hour
Chat Requests: 30/minute, 500/hour
Monthly Queries: 3,000 total queries

Business Plan

API Requests: 300/minute, 10,000/hour
Chat Requests: 60/minute, 2,000/hour
Monthly Queries: 10,000 total queries

Enterprise Plan

API Requests: 600/minute, 30,000/hour
Chat Requests: 120/minute, 10,000/hour
Monthly Queries: 30,000 total queries
Custom Limits: Available upon request
Dedicated infrastructure
SLA guarantees
Priority support

How Rate Limiting Works

Sliding Window

dialektai uses a sliding window algorithm:

Window: 60 seconds
Limit: 100 requests

Time: 0s - Request 1
Time: 1s - Request 2
...
Time: 59s - Request 100
Time: 60s - Request 101 ✅ (Request 1 dropped from window)
Time: 61s - Request 102 ✅ (Request 2 dropped from window)

Per-Organization

Rate limits are enforced per organization, not per API key:

# Organization A (Starter plan, API requests)
API Key 1: 30 requests/min
API Key 2: 30 requests/min
Total: 60 requests/min (within 60/min limit) ✅

# Organization A
API Key 1: 50 requests/min
API Key 2: 20 requests/min
Total: 70 requests/min (exceeds 60/min limit) ❌

Rate Limit Headers

Every API response includes rate limit information:

Response Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1697624400
X-RateLimit-Window: 60

Header Descriptions:

X-RateLimit-Limit: Maximum requests allowed in window (API or Chat, depending on endpoint)
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when limit resets
X-RateLimit-Window: Window size in seconds (60 for per-minute, 3600 for per-hour)

Example Usage

import requests
from datetime import datetime

response = requests.post(url, headers=headers)

limit = int(response.headers.get('X-RateLimit-Limit'))
remaining = int(response.headers.get('X-RateLimit-Remaining'))
reset = int(response.headers.get('X-RateLimit-Reset'))

print(f"Rate limit: {remaining}/{limit}")
print(f"Resets at: {datetime.fromtimestamp(reset)}")

if remaining < 10:
    print("⚠️ Approaching rate limit!")

Handling Rate Limits

429 Too Many Requests

When rate limit is exceeded, API returns:

{
  "detail": "Rate limit exceeded",
  "retry_after": 45,
  "limit": 100,
  "window": "1 minute"
}

Response Headers:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1697624445

Retry Logic

Basic Retry

import time
import requests

def make_request():
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        print(f"Rate limited. Waiting {retry_after}s...")
        time.sleep(retry_after)
        return make_request()  # Retry

    return response.json()

Exponential Backoff

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    retry=retry_if_exception_type(requests.exceptions.HTTPError)
)
def make_request_with_backoff():
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 429:
        raise requests.exceptions.HTTPError(response=response)

    response.raise_for_status()
    return response.json()

Node.js Example

const axios = require('axios');

async function makeRequestWithRetry(maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await axios.post(url, data, { headers });
      return response.data;
    } catch (error) {
      if (error.response?.status === 429) {
        const retryAfter = parseInt(error.response.headers['retry-after'] || '60');
        console.log(`Rate limited. Waiting ${retryAfter}s...`);
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        continue;
      }
      throw error;
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limit Bypass Strategies

1. Caching

Cache responses to reduce API calls:

from functools import lru_cache
import time

@lru_cache(maxsize=1000)
def cached_query(query_hash, ttl_hash):
    # ttl_hash changes every hour, invalidating cache
    return make_api_request(query_hash)

# Usage
ttl_hash = int(time.time() // 3600)  # Changes every hour
result = cached_query(query, ttl_hash)

2. Request Batching

Batch multiple queries into single requests:

# ❌ Multiple requests (uses 3 quota)
result1 = query("Show me customers")
result2 = query("Show me orders")
result3 = query("Show me products")

# ✅ Single batched request (uses 1 quota)
results = batch_query([
    "Show me customers",
    "Show me orders",
    "Show me products"
])

3. Request Queueing

Queue requests and process within rate limits:

import queue
import threading
import time

class RateLimitedQueue:
    def __init__(self, requests_per_minute):
        self.queue = queue.Queue()
        self.requests_per_minute = requests_per_minute
        self.interval = 60 / requests_per_minute

    def add(self, request):
        self.queue.put(request)

    def process(self):
        while True:
            if not self.queue.empty():
                request = self.queue.get()
                response = make_api_request(request)
                time.sleep(self.interval)
            else:
                time.sleep(1)

# Usage for API requests (Starter plan: 60/min)
queue = RateLimitedQueue(requests_per_minute=60)
queue.add({"message": "Show me customers"})
threading.Thread(target=queue.process, daemon=True).start()

Upgrading Your Plan

To upgrade your plan or view pricing details, visit the billing section in your portal:

Portal: https://app.dialektai.com/billing

You can also view our pricing on our website:

Pricing Page: https://dialektai.com/pricing

From the billing page, you can:

Compare available plans and their limits
Upgrade or downgrade your subscription
Switch between annual and monthly billing
View your current usage and quota
Manage payment methods

Enterprise Custom Limits

Enterprise and Custom plan customers can request tailored rate limits based on their needs:

Contact Sales: [email protected]

Custom Options:

Higher rate limits - 1000+ requests/minute for API and Chat
Custom monthly quotas - 100,000+ queries per month
Dedicated infrastructure - Isolated resources
Burst allowance - Temporary limit increases
Reserved capacity - Guaranteed throughput
Custom LLM models - Choose specific AI models for your use case

Best Practices

1. Implement Backoff

Always implement retry logic with exponential backoff when you receive a 429 response:

# Don't immediately retry on 429
if response.status_code == 429:
    retry_after = int(response.headers.get('Retry-After'))
    time.sleep(retry_after)  # Wait before retrying

2. Cache Responses

Cache frequently accessed data to reduce API calls:

# Cache frequently used queries
@cache(ttl=3600)
def get_dashboard_data():
    return make_api_request("Show me dashboard metrics")

3. Use Webhooks

For background jobs, use webhooks instead of polling to avoid wasting quota:

# ❌ Polling (wastes quota)
while True:
    status = check_job_status(job_id)
    if status == "completed":
        break
    time.sleep(5)

# ✅ Webhook (no quota usage)
@app.post("/webhook/job-complete")
def job_complete(job_id):
    process_job_result(job_id)

Troubleshooting

Issue: Constantly hitting rate limits

Solutions:

Implement caching
Batch requests where possible
Upgrade plan
Optimize query frequency

Issue: Different API keys have different limits

Explanation: Rate limits are per-organization, not per API key. All API keys in the same organization share the same quota.

Solution: Create separate organizations for different rate limit pools.

Getting Started

Authentication

Integrations

Advanced

Resources

API Reference