Role Description: Own deployment, observability, reliability, cost control, and production operations for the AI helpdesk platform. Build and manage CI/CD pipelines, infrastructure, and runtime environments for AI services. Deploy and operate model-serving, orchestration, and application workloads. Implement monitoring, tracing, alerting, logging, and operational dashboards. Manage scaling, release processes, rollback mechanisms, and production support. Optimize inference cost, latency, uptime, and system reliability. Create runbooks, incident response processes, and operational standards.