Sending millions of emails per day requires careful architecture. Queue management, rate limiting, and infrastructure design all impact deliverability and reliability. Here's how to build for scale.
Architecture overview
Core components
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ API/App │────▶│ Queue │────▶│ Workers │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌─────────────┐ │
│ Provider │◀───────────┘
│ API │
└─────────────┘
Queue design
interface EmailJob {
id: string;
priority: 'critical' | 'high' | 'normal' | 'low';
payload: {
to: string;
from: string;
subject: string;
html: string;
};
metadata: {
userId: string;
campaignId?: string;
scheduledFor?: Date;
};
attempts: number;
maxAttempts: number;
createdAt: Date;
}
// Priority queues
const queues = {
critical: 'email:critical', // Password resets, 2FA
high: 'email:high', // Transactional
normal: 'email:normal', // Notifications
low: 'email:low' // Marketing, digests
};
Rate limiting
Provider rate limits
class RateLimiter {
private limits: Map<string, { count: number; resetAt: Date }> = new Map();
async canSend(provider: string): Promise<boolean> {
const limit = this.limits.get(provider);
if (!limit || limit.resetAt < new Date()) {
this.limits.set(provider, {
count: 1,
resetAt: addSeconds(new Date(), 1)
});
return true;
}
if (limit.count >= PROVIDER_RATE_LIMITS[provider]) {
return false;
}
limit.count++;
return true;
}
}
ISP throttling
const ispLimits = {
'gmail.com': { perHour: 500, perDay: 5000 },
'yahoo.com': { perHour: 200, perDay: 2000 },
'outlook.com': { perHour: 300, perDay: 3000 }
};
async function getThrottledBatch(emails: Email[]): Promise<Email[]> {
const byDomain = groupBy(emails, e => getDomain(e.to));
const batch: Email[] = [];
for (const [domain, domainEmails] of Object.entries(byDomain)) {
const limit = ispLimits[domain] || { perHour: 1000 };
const sent = await getSentCount(domain, 'hour');
const available = limit.perHour - sent;
batch.push(...domainEmails.slice(0, available));
}
return batch;
}
Worker scaling
Horizontal scaling
// Worker configuration
const workerConfig = {
concurrency: 50, // Emails per worker
batchSize: 100, // Fetch from queue
pollInterval: 100, // ms between polls
maxRetries: 3,
retryDelay: [1000, 5000, 30000] // Exponential backoff
};
async function processQueue(queue: string) {
while (true) {
const jobs = await redis.lpop(queue, workerConfig.batchSize);
if (jobs.length === 0) {
await sleep(workerConfig.pollInterval);
continue;
}
await Promise.all(
jobs.map(job => processJob(JSON.parse(job)))
);
}
}
Auto-scaling triggers
const scalingRules = {
scaleUp: {
queueDepth: 10000, // Jobs waiting
processingTime: 5000, // ms per job
errorRate: 0.05 // 5% errors
},
scaleDown: {
queueDepth: 100,
idleTime: 300000 // 5 minutes idle
}
};
Batch processing
Efficient batching
async function sendBatch(emails: Email[]): Promise<BatchResult> {
// Group by template for efficiency
const byTemplate = groupBy(emails, 'templateId');
const results: SendResult[] = [];
for (const [templateId, templateEmails] of Object.entries(byTemplate)) {
// Compile template once
const template = await getCompiledTemplate(templateId);
// Send in parallel with concurrency limit
const batchResults = await pMap(
templateEmails,
email => sendWithTemplate(email, template),
{ concurrency: 50 }
);
results.push(...batchResults);
}
return { sent: results.filter(r => r.success).length, results };
}
Monitoring at scale
Key metrics
const metrics = {
// Throughput
emailsPerSecond: gauge('emails_per_second'),
emailsPerMinute: gauge('emails_per_minute'),
// Latency
queueLatency: histogram('queue_latency_ms'),
sendLatency: histogram('send_latency_ms'),
// Errors
errorRate: gauge('error_rate'),
bounceRate: gauge('bounce_rate'),
// Queue health
queueDepth: gauge('queue_depth'),
oldestJob: gauge('oldest_job_age_seconds')
};
Alerting thresholds
const alerts = [
{
name: 'high_queue_depth',
condition: 'queue_depth > 100000',
severity: 'warning'
},
{
name: 'high_error_rate',
condition: 'error_rate > 0.05',
severity: 'critical'
},
{
name: 'slow_processing',
condition: 'queue_latency_p99 > 60000',
severity: 'warning'
}
];
Database considerations
Efficient storage
// Partition by date for easy cleanup
const emailLogSchema = {
tableName: 'email_logs',
partitionBy: 'RANGE (created_at)',
indexes: [
'user_id',
'campaign_id',
'status',
'created_at'
],
retention: '90 days'
};
// Archive old data
async function archiveOldEmails() {
await db.query(`
INSERT INTO email_logs_archive
SELECT * FROM email_logs
WHERE created_at < NOW() - INTERVAL '90 days'
`);
await db.query(`
DELETE FROM email_logs
WHERE created_at < NOW() - INTERVAL '90 days'
`);
}
Best practices
- —Prioritize queues - Critical emails first
- —Respect rate limits - Both provider and ISP
- —Monitor everything - Queue depth, latency, errors
- —Plan for failure - Retries, dead letter queues
- —Scale horizontally - Add workers, not bigger servers
- —Batch efficiently - Group by template, domain
- —Archive aggressively - Don't let logs grow forever
High-volume email is about consistent, reliable delivery at scale. Build for the volume you expect, but design for 10x growth.