Platform Engineering · AKS · Kubernetes · Redis

Platform Engineering
Production-Proven Patterns

Platform engineering work on APEX: AKS multi-tenant cluster, Redis backbone with 5 distinct data patterns, FastAPI at scale with per-tenant concurrency control, automated CI/CD with rolling deployments. All observable in production.

Azure AKS Kubernetes Helm NGINX Ingress cert-manager FastAPI Redis 7 GitHub Actions Firebase Hosting Python 3.11

Kubernetes Operational Patterns

Patterns applied in APEX AKS cluster — observable in deployed manifests, not theoretical positions.

Rolling Deployments + PDB

FastAPI deployments use rolling update strategy with PodDisruptionBudget (minAvailable: 1). Ensures at least one replica handles traffic during every deployment. Typical observed rollout: ~4 minutes including readiness probe delay.

HPA + Resource Limits

Horizontal Pod Autoscaler on FastAPI pods with CPU/memory targets. Resource requests/limits set per pod. Prevents noisy-neighbour OOM across replicas. Redis StatefulSet runs fixed 1 replica with PVC.

Liveness + Readiness Probes

FastAPI: readinessProbe on /health (initialDelaySeconds: 30). Liveness probe (initialDelaySeconds: 60) prevents premature restart during cold-start LLM client init. Redis: TCP socket probe on 6379.

Namespace Isolation

Tenant data isolation enforced at application layer — Firestore collection-scoped queries, Redis key namespacing. K8s namespace provides cluster resource boundary with NetworkPolicy for east-west traffic.

NGINX Ingress + TLS

Single NGINX Ingress Controller as cluster entry point. cert-manager issues Let's Encrypt TLS via ACME HTTP-01 challenge. Auto-renews 30 days before expiry. WebSocket upgrade annotations enable Twilio Media Streams.

StatefulSet for Redis

Redis deployed as StatefulSet (not Deployment) for stable network identity and ordered startup. PVC with 10Gi standard SSD provides AOF persistence. appendfsync=everysec — 1-second data loss tolerance accepted for queue durability.

Redis: 5 Patterns in One Instance

Single Redis instance used for 5 semantically distinct data roles. Key namespacing separates them. 256MB cap — volatile-lru eviction protects only session/cache keys; queue and idempotency keys are non-volatile. Click any card to expand.

Priority Job Queue
ZSET — non-volatile
No TTL · durable across restarts
KEDA scales workers directly from Redis queue depth. Priority via ZSET score — high-priority jobs jump the queue without a separate queue.
Key Pattern

queue:campaigns:{tenant_id} — ZSET with score = priority. KEDA ScaledObject targets this key for worker pod autoscaling.

Eviction Policy

Non-volatile — never evicted by memory pressure. AOF persistence ensures queue survives pod restart. Memory budget for queue is bounded by campaign batch size limits.

Session State
STRING / HASH — volatile-lru
30 min TTL from last turn
Stores last 8 conversation turns + barge-in token per call. TTL resets on every turn — session survives long pauses but auto-expires abandoned calls.
Key Pattern

session:{call_sid} — JSON payload with history array (last 8 turns), current tenant config, and active barge-in token flag.

Eviction Policy

Volatile-lru — evicted under memory pressure if TTL not yet expired. Acceptable: a session evicted mid-call causes a graceful degradation (generic fallback), not a crash.

Idempotency Keys
STRING · SET NX
5 min TTL
Twilio retries webhooks on 5xx. SET NX on call_sid prevents duplicate call handling from retry storms.
Key Pattern

idem:{call_sid} — SET NX with 5 minute expiry. On duplicate webhook: NX returns nil → request discarded immediately without processing.

Eviction Policy

Volatile-lru. Eviction before TTL is acceptable: would allow a duplicate through, but the duplicate is idempotent at the application layer (Firestore write is upsert).

Rate Limit Counters
STRING · INCR
Window duration TTL
Per-tenant rate limiting without a dedicated service. INCR is atomic — no race condition under concurrent requests.
Key Pattern

rl:{tenant_id}:{window} — INCR returns count; compare against tenant tier limit. Window TTL auto-resets the counter.

Eviction Policy

Volatile-lru. A rate limit counter evicted under memory pressure just resets the window — slightly permissive under extreme memory pressure, acceptable trade-off.

TTS Audio Cache
STRING · SHA256 key
24 hr TTL
Cache hit returns pre-computed mulaw bytes in <200ms vs Azure TTS at ~800ms–1.2s. Common phrases pre-warmed at tenant startup.
Key Pattern

tts:{sha256(normalised_text)} — raw mulaw bytes. SHA256 on normalised text = deterministic cache key even with minor punctuation variations.

Eviction Policy

Volatile-lru. Most LRU audio is for rarely-repeated phrases. Common greeting phrases survive because they're accessed every call. Cache miss falls back to Azure TTS transparently.

Single instance justified at current scale (sub-50 concurrent tenants). Dedicated Redis Cluster warranted beyond that threshold.

FastAPI at Scale: Concurrency Patterns

Per-Tenant Asyncio Semaphores

Each tenant gets an asyncio.Semaphore bounding concurrent LLM calls. Prevents one tenant's traffic consuming all Groq API quota and starving others. Count configurable per tenant tier.

AIMD Concurrency Control

Additive-Increase Multiplicative-Decrease on Groq API calls. On 429: halve global concurrency. On success: increment by 1. Converges to stable throughput without fixed limits that under-utilise quota.

Streaming Token Splitting

Groq streaming tokens accumulated and split on sentence boundaries (।?!). Each sentence dispatched to Azure TTS immediately — reduces first-audio latency by ~300ms vs waiting for full LLM response. Barge-in token checked at every TTS chunk.

WebSocket Lifecycle Management

Twilio Media Streams WebSocket per call. Deepgram STT WebSocket per call. Both managed as async context managers with explicit cleanup on call_end webhook. Redis session TTL set on cleanup, not on start — avoids stale sessions from crash-disconnect.

CI/CD Pipeline

Push to master → production in ~4 minutes. No manual deploy steps for backend.

Trigger
Push to master
GitHub Actions workflow fires on merge. No manual gate for normal deploys.
Build
Docker multi-stage
Python 3.11 slim base. Dependencies pinned. Image pushed to ACR (Azure Container Registry).
Deploy
kubectl set image
Kubernetes rolling update. PDB prevents downtime. Observed rollout time: ~4 minutes end-to-end.
Frontend
firebase deploy
npm build → Firebase Hosting. Marketing + app + architecture in one deploy via sync-marketing.mjs.
Production Evidence
40+ CI/CD deployments completed to AKS across the project lifetime. Rolling update model observed to complete without user-facing downtime.

Platform Engineering Engagements

Specific platform problems I can help solve, with production experience on each.

AKS / Kubernetes setup & hardening

Cluster setup, RBAC, namespace isolation, ingress controller, cert-manager, external-dns, monitoring stack (Prometheus + Grafana).

Multi-tenant platform architecture

Tenant isolation strategies, data separation at Redis/Firestore layer, per-tenant quotas, onboarding automation, provisioning APIs.

Redis design for production

Eviction policies, persistence modes (AOF vs RDB), key namespace design, memory sizing, cluster vs single-instance trade-offs.

FastAPI async concurrency

Semaphore patterns, AIMD throttling, WebSocket lifecycle management, streaming response handling, per-tenant concurrency isolation.

CI/CD pipeline for AKS

GitHub Actions → ACR → kubectl rolling deploy. Build optimisation, rollback procedures, deploy observability, PDB enforcement.

Reliability engineering

Circuit breakers, retry policies, graceful degradation, failure domain analysis, incident playbooks, alerting runbooks.

Discuss a Platform Problem

Describe the platform challenge — scaling, reliability, cost, or architecture. I'll respond with a candid assessment and proposed engagement scope.