Rate Limiting
Overview
dialektai implements rate limiting to ensure fair usage and system stability. This page explains the rate limits, how they work, and best practices for handling them.
Understanding Rate Limit Types
dialektai enforces rate limits at three levels:
- Per-Minute Limits - Burst protection to prevent short-term spikes
- Per-Hour Limits - Sustained usage protection
- Monthly Quotas - Total query budget for the billing period
Rate limits are tracked separately for two types of access:
- API Requests - Programmatic access via API keys (e.g., integrations, custom UIs)
- Chat Requests - Interactive queries via the web interface or widget
All limits are enforced per organization, meaning all API keys and users within your organization share the same quota.
Rate Limit Tiers
Free Trial (14 days)
- API Requests: 30/minute, 500/hour
- Chat Requests: 5/minute, 50/hour
- Monthly Queries: 150 total queries
- Duration: 14 days
Starter Plan
- API Requests: 60/minute, 1,000/hour
- Chat Requests: 10/minute, 100/hour
- Monthly Queries: 500 total queries
Professional Plan
- API Requests: 120/minute, 3,000/hour
- Chat Requests: 30/minute, 500/hour
- Monthly Queries: 3,000 total queries
Business Plan
- API Requests: 300/minute, 10,000/hour
- Chat Requests: 60/minute, 2,000/hour
- Monthly Queries: 10,000 total queries
Enterprise Plan
- API Requests: 600/minute, 30,000/hour
- Chat Requests: 120/minute, 10,000/hour
- Monthly Queries: 30,000 total queries
- Custom Limits: Available upon request
- Dedicated infrastructure
- SLA guarantees
- Priority support
How Rate Limiting Works
Sliding Window
dialektai uses a sliding window algorithm:
Window: 60 seconds
Limit: 100 requests
Time: 0s - Request 1
Time: 1s - Request 2
...
Time: 59s - Request 100
Time: 60s - Request 101 ✅ (Request 1 dropped from window)
Time: 61s - Request 102 ✅ (Request 2 dropped from window)
Per-Organization
Rate limits are enforced per organization, not per API key:
# Organization A (Starter plan, API requests)
API Key 1: 30 requests/min
API Key 2: 30 requests/min
Total: 60 requests/min (within 60/min limit) ✅
# Organization A
API Key 1: 50 requests/min
API Key 2: 20 requests/min
Total: 70 requests/min (exceeds 60/min limit) ❌
Rate Limit Headers
Every API response includes rate limit information:
Response Headers
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1697624400
X-RateLimit-Window: 60
Header Descriptions:
X-RateLimit-Limit: Maximum requests allowed in window (API or Chat, depending on endpoint)X-RateLimit-Remaining: Requests remaining in current windowX-RateLimit-Reset: Unix timestamp when limit resetsX-RateLimit-Window: Window size in seconds (60 for per-minute, 3600 for per-hour)
Example Usage
import requests
from datetime import datetime
response = requests.post(url, headers=headers)
limit = int(response.headers.get('X-RateLimit-Limit'))
remaining = int(response.headers.get('X-RateLimit-Remaining'))
reset = int(response.headers.get('X-RateLimit-Reset'))
print(f"Rate limit: {remaining}/{limit}")
print(f"Resets at: {datetime.fromtimestamp(reset)}")
if remaining < 10:
print("⚠️ Approaching rate limit!")
Handling Rate Limits
429 Too Many Requests
When rate limit is exceeded, API returns:
{
"detail": "Rate limit exceeded",
"retry_after": 45,
"limit": 100,
"window": "1 minute"
}
Response Headers:
HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1697624445
Retry Logic
Basic Retry
import time
import requests
def make_request():
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
return make_request() # Retry
return response.json()
Exponential Backoff
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type
)
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=60),
retry=retry_if_exception_type(requests.exceptions.HTTPError)
)
def make_request_with_backoff():
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
raise requests.exceptions.HTTPError(response=response)
response.raise_for_status()
return response.json()
Node.js Example
const axios = require('axios');
async function makeRequestWithRetry(maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await axios.post(url, data, { headers });
return response.data;
} catch (error) {
if (error.response?.status === 429) {
const retryAfter = parseInt(error.response.headers['retry-after'] || '60');
console.log(`Rate limited. Waiting ${retryAfter}s...`);
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}
Rate Limit Bypass Strategies
1. Caching
Cache responses to reduce API calls:
from functools import lru_cache
import time
@lru_cache(maxsize=1000)
def cached_query(query_hash, ttl_hash):
# ttl_hash changes every hour, invalidating cache
return make_api_request(query_hash)
# Usage
ttl_hash = int(time.time() // 3600) # Changes every hour
result = cached_query(query, ttl_hash)
2. Request Batching
Batch multiple queries into single requests:
# ❌ Multiple requests (uses 3 quota)
result1 = query("Show me customers")
result2 = query("Show me orders")
result3 = query("Show me products")
# ✅ Single batched request (uses 1 quota)
results = batch_query([
"Show me customers",
"Show me orders",
"Show me products"
])
3. Request Queueing
Queue requests and process within rate limits:
import queue
import threading
import time
class RateLimitedQueue:
def __init__(self, requests_per_minute):
self.queue = queue.Queue()
self.requests_per_minute = requests_per_minute
self.interval = 60 / requests_per_minute
def add(self, request):
self.queue.put(request)
def process(self):
while True:
if not self.queue.empty():
request = self.queue.get()
response = make_api_request(request)
time.sleep(self.interval)
else:
time.sleep(1)
# Usage for API requests (Starter plan: 60/min)
queue = RateLimitedQueue(requests_per_minute=60)
queue.add({"message": "Show me customers"})
threading.Thread(target=queue.process, daemon=True).start()
Upgrading Your Plan
To upgrade your plan or view pricing details, visit the billing section in your portal:
Portal: https://app.dialektai.com/billing
You can also view our pricing on our website:
Pricing Page: https://dialektai.com/pricing
From the billing page, you can:
- Compare available plans and their limits
- Upgrade or downgrade your subscription
- Switch between annual and monthly billing
- View your current usage and quota
- Manage payment methods
Enterprise Custom Limits
Enterprise and Custom plan customers can request tailored rate limits based on their needs:
Contact Sales: [email protected]
Custom Options:
- Higher rate limits - 1000+ requests/minute for API and Chat
- Custom monthly quotas - 100,000+ queries per month
- Dedicated infrastructure - Isolated resources
- Burst allowance - Temporary limit increases
- Reserved capacity - Guaranteed throughput
- Custom LLM models - Choose specific AI models for your use case
Best Practices
1. Implement Backoff
Always implement retry logic with exponential backoff when you receive a 429 response:
# Don't immediately retry on 429
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After'))
time.sleep(retry_after) # Wait before retrying
2. Cache Responses
Cache frequently accessed data to reduce API calls:
# Cache frequently used queries
@cache(ttl=3600)
def get_dashboard_data():
return make_api_request("Show me dashboard metrics")
3. Use Webhooks
For background jobs, use webhooks instead of polling to avoid wasting quota:
# ❌ Polling (wastes quota)
while True:
status = check_job_status(job_id)
if status == "completed":
break
time.sleep(5)
# ✅ Webhook (no quota usage)
@app.post("/webhook/job-complete")
def job_complete(job_id):
process_job_result(job_id)
Troubleshooting
Issue: Constantly hitting rate limits
Solutions:
- Implement caching
- Batch requests where possible
- Upgrade plan
- Optimize query frequency
Issue: Different API keys have different limits
Explanation: Rate limits are per-organization, not per API key. All API keys in the same organization share the same quota.
Solution: Create separate organizations for different rate limit pools.