Platform Engineering
Production-Proven Patterns
Platform engineering work on APEX: AKS multi-tenant cluster, Redis backbone with 5 distinct data patterns, FastAPI at scale with per-tenant concurrency control, automated CI/CD with rolling deployments. All observable in production.
Kubernetes Operational Patterns
Patterns applied in APEX AKS cluster — observable in deployed manifests, not theoretical positions.
Rolling Deployments + PDB
FastAPI deployments use rolling update strategy with PodDisruptionBudget (minAvailable: 1). Ensures at least one replica handles traffic during every deployment. Typical observed rollout: ~4 minutes including readiness probe delay.
HPA + Resource Limits
Horizontal Pod Autoscaler on FastAPI pods with CPU/memory targets. Resource requests/limits set per pod. Prevents noisy-neighbour OOM across replicas. Redis StatefulSet runs fixed 1 replica with PVC.
Liveness + Readiness Probes
FastAPI: readinessProbe on /health (initialDelaySeconds: 30). Liveness probe (initialDelaySeconds: 60) prevents premature restart during cold-start LLM client init. Redis: TCP socket probe on 6379.
Namespace Isolation
Tenant data isolation enforced at application layer — Firestore collection-scoped queries, Redis key namespacing. K8s namespace provides cluster resource boundary with NetworkPolicy for east-west traffic.
NGINX Ingress + TLS
Single NGINX Ingress Controller as cluster entry point. cert-manager issues Let's Encrypt TLS via ACME HTTP-01 challenge. Auto-renews 30 days before expiry. WebSocket upgrade annotations enable Twilio Media Streams.
StatefulSet for Redis
Redis deployed as StatefulSet (not Deployment) for stable network identity and ordered startup. PVC with 10Gi standard SSD provides AOF persistence. appendfsync=everysec — 1-second data loss tolerance accepted for queue durability.
Redis: 5 Patterns in One Instance
Single Redis instance used for 5 semantically distinct data roles. Key namespacing separates them. 256MB cap — volatile-lru eviction protects only session/cache keys; queue and idempotency keys are non-volatile. Click any card to expand.
Priority Job Queue ▾
ZSET — non-volatile
No TTL · durable across restarts
KEDA scales workers directly from Redis queue depth. Priority via ZSET score — high-priority jobs jump the queue without a separate queue.
Key Pattern
queue:campaigns:{tenant_id} — ZSET with score = priority. KEDA ScaledObject targets this key for worker pod autoscaling.
Eviction Policy
Non-volatile — never evicted by memory pressure. AOF persistence ensures queue survives pod restart. Memory budget for queue is bounded by campaign batch size limits.
Session State ▾
STRING / HASH — volatile-lru
30 min TTL from last turn
Stores last 8 conversation turns + barge-in token per call. TTL resets on every turn — session survives long pauses but auto-expires abandoned calls.
Key Pattern
session:{call_sid} — JSON payload with history array (last 8 turns), current tenant config, and active barge-in token flag.
Eviction Policy
Volatile-lru — evicted under memory pressure if TTL not yet expired. Acceptable: a session evicted mid-call causes a graceful degradation (generic fallback), not a crash.
Idempotency Keys ▾
STRING · SET NX
5 min TTL
Twilio retries webhooks on 5xx. SET NX on call_sid prevents duplicate call handling from retry storms.
Key Pattern
idem:{call_sid} — SET NX with 5 minute expiry. On duplicate webhook: NX returns nil → request discarded immediately without processing.
Eviction Policy
Volatile-lru. Eviction before TTL is acceptable: would allow a duplicate through, but the duplicate is idempotent at the application layer (Firestore write is upsert).
Rate Limit Counters ▾
STRING · INCR
Window duration TTL
Per-tenant rate limiting without a dedicated service. INCR is atomic — no race condition under concurrent requests.
Key Pattern
rl:{tenant_id}:{window} — INCR returns count; compare against tenant tier limit. Window TTL auto-resets the counter.
Eviction Policy
Volatile-lru. A rate limit counter evicted under memory pressure just resets the window — slightly permissive under extreme memory pressure, acceptable trade-off.
TTS Audio Cache ▾
STRING · SHA256 key
24 hr TTL
Cache hit returns pre-computed mulaw bytes in <200ms vs Azure TTS at ~800ms–1.2s. Common phrases pre-warmed at tenant startup.
Key Pattern
tts:{sha256(normalised_text)} — raw mulaw bytes. SHA256 on normalised text = deterministic cache key even with minor punctuation variations.
Eviction Policy
Volatile-lru. Most LRU audio is for rarely-repeated phrases. Common greeting phrases survive because they're accessed every call. Cache miss falls back to Azure TTS transparently.
Single instance justified at current scale (sub-50 concurrent tenants). Dedicated Redis Cluster warranted beyond that threshold.
FastAPI at Scale: Concurrency Patterns
Per-Tenant Asyncio Semaphores
Each tenant gets an asyncio.Semaphore bounding concurrent LLM calls. Prevents one tenant's traffic consuming all Groq API quota and starving others. Count configurable per tenant tier.
AIMD Concurrency Control
Additive-Increase Multiplicative-Decrease on Groq API calls. On 429: halve global concurrency. On success: increment by 1. Converges to stable throughput without fixed limits that under-utilise quota.
Streaming Token Splitting
Groq streaming tokens accumulated and split on sentence boundaries (।?!). Each sentence dispatched to Azure TTS immediately — reduces first-audio latency by ~300ms vs waiting for full LLM response. Barge-in token checked at every TTS chunk.
WebSocket Lifecycle Management
Twilio Media Streams WebSocket per call. Deepgram STT WebSocket per call. Both managed as async context managers with explicit cleanup on call_end webhook. Redis session TTL set on cleanup, not on start — avoids stale sessions from crash-disconnect.
CI/CD Pipeline
Push to master → production in ~4 minutes. No manual deploy steps for backend.
Platform Engineering Engagements
Specific platform problems I can help solve, with production experience on each.
AKS / Kubernetes setup & hardening
Cluster setup, RBAC, namespace isolation, ingress controller, cert-manager, external-dns, monitoring stack (Prometheus + Grafana).
Multi-tenant platform architecture
Tenant isolation strategies, data separation at Redis/Firestore layer, per-tenant quotas, onboarding automation, provisioning APIs.
Redis design for production
Eviction policies, persistence modes (AOF vs RDB), key namespace design, memory sizing, cluster vs single-instance trade-offs.
FastAPI async concurrency
Semaphore patterns, AIMD throttling, WebSocket lifecycle management, streaming response handling, per-tenant concurrency isolation.
CI/CD pipeline for AKS
GitHub Actions → ACR → kubectl rolling deploy. Build optimisation, rollback procedures, deploy observability, PDB enforcement.
Reliability engineering
Circuit breakers, retry policies, graceful degradation, failure domain analysis, incident playbooks, alerting runbooks.
Discuss a Platform Problem
Describe the platform challenge — scaling, reliability, cost, or architecture. I'll respond with a candid assessment and proposed engagement scope.