Resume

Making systems reliable, deployments boring, and incidents rare.

[email protected] · alenabraham.me · GitHub · LinkedIn

Experience

Qure.ai

Medical imaging AI company using deep learning to detect critical findings in X-rays, CT scans, and emergency radiology — deployed across 15+ AWS regions serving hospitals globally.

Full-time · 3 yrs 7 mos+ · Bengaluru, India

Senior DevOps Engineer

Apr 2026 – Present

Reduced Datadog costs from $50K to $30K/month ($240K annual savings) by implementing Flex Logs and optimizing log ingestion pipelines
Built Jenkins monitoring dashboard on Datadog, giving developers real-time visibility into CI/CD pipeline health and build status
Building a public service status page (status.qure.ai) with heartbeat monitoring for real-time uptime visibility across all services
Migrating CPU-intensive CI tasks from Jenkins to GitHub Actions with S3 caching to improve build performance and reduce infrastructure load
Driving reliability and scalability improvements for qTrack — optimizing performance for production healthcare workloads

Senior Site Reliability Engineer

Apr 2025 – Mar 2026 · 1 yr

Designed SLO tracking using Turn Around Time (TAT) as the primary SLI, distinguishing bulk uploads from urgent scans to enable data-driven reliability decisions across 15+ regions
Implemented end-to-end distributed tracing using Datadog APM with custom trace tags, reducing cross-service debugging time across 3 microservices
Developed custom Datadog metrics pipeline — Series API for dashboards/alerting (low cardinality), Events API for debugging (high cardinality) — cutting observability costs while maintaining full debuggability
Architected Bazel-based smart build system for 39+ package monorepo with reverse dependency resolution and parallel dispatch via semaphore-based ordering
Delivered 3-stage deployment pipeline (TEST → PUBLISH → DEPLOY) with CodeDeploy, decoupling application deployments from infrastructure changes
Owned Jenkinsfiles and CI/CD workflows across 12+ services, standardizing build, test, and release processes using a shared Jenkins library (Hawkeye)
Engineered self-healing EC2 instances using IMDSv2 failure detection with automatic ASG replacement, reducing recovery time to under 5 minutes with zero manual intervention
Optimized ASG auto-scaling (CPU 90%/30%, 10-min evaluation, 5-min cooldown) and reduced ALB deregistration from 300s to 30s, accelerating rolling deployments by 10x
Deployed on-premise medical imaging solutions across 5+ countries including UAE (SEHA visa screening, Burjeel hospital CT/X-ray, MOH UAE) and Vietnam with end-to-end server setup and DICOM modality integration
Developed "Pulse" — an internal monitoring platform (Django, React, TypeScript, PostgreSQL) providing real-time health dashboards for on-premise services and DICOM gateway infrastructure globally
Building "Agent-Qurie" — an AI-powered knowledge base portal (LiteLLM, Open-WebUI, Prometheus) enabling vendors to self-serve L1 incident resolution, reducing escalations to the SRE team

Site Reliability Engineer

Mar 2023 – Mar 2025 · 2 yrs

Created reusable AWS CDK construct library deploying across 15+ production regions (AWS, Huawei Cloud, Alibaba Cloud) with Pydantic-validated configs preventing misconfigurations before production
Integrated Bandit SAST scanning on every commit with baseline comparison; established split-PR enforcement to prevent cross-service merge conflicts
Containerized multiple services with Docker and Docker Compose across staging, production, and on-premise environments with environment-specific configurations
Co-built internal license management platform (Django + React) handling license lifecycle and deployment coordination for cloud and air-gapped hospital environments
Established per-region p95/p99 TAT dashboards and error rate monitoring, enabling real-time service visibility and SLO-driven deployment decisions
Authored SRE Knowledge Base with runbooks, incident response procedures, onboarding guides, and operational documentation adopted across the engineering team
Maintained and operated Fomema — a legacy healthcare client on Alibaba Cloud since 2022, handling ongoing infrastructure management, incident resolution, and platform stability

Technical Operations Engineer

Sep 2022 – Feb 2023 · 6 mos

Resolved L1-L3 production issues across cloud and on-premise environments — debugging distributed systems, container failures, network misconfigurations, and application-level errors
Collaborated cross-functionally with backend, frontend, product, QA, and BD teams; provided infrastructure cost analysis for solution pricing and client proposals
Led incident response across 15+ regions with on-call rotations including weekends; conducted client-facing technical scoping and served as interim TPM for select clients

Tata Consultancy Services

Trivandrum, India

Assistant System Engineer

Jul 2021 – Sep 2022 · 1 yr 3 mos

Built Jenkins CI/CD pipelines for Java Spring Boot applications deployed on GCP; volunteered for 24x7 on-call

Naas.ai

Remote

Open Source Contributor

Dec 2021 – Aug 2022 · 9 mos

Design consultant; contributing to documentation and DevOps

Cognetry Labs

Trivandrum, India

Technical Intern

Nov 2020 – Feb 2021 · 4 mos

Redesigned company website and designed interfaces for a mobile app and admin panel

Skills

SRE & ObservabilitySLOs/SLIs, Datadog (APM, Metrics, Tracing), Incident Response, Blameless Postmortems, On-Call

Cloud & InfrastructureAWS (EC2, ASG, ALB, RDS, EFS, S3, CodeDeploy), Huawei Cloud, Alibaba Cloud, Viettel Cloud

IaC & ContainersAWS CDK (Python), CloudFormation, Ansible, Docker, Docker Compose

CI/CD & AutomationJenkins, Bazel, GitHub Actions, CodeDeploy, Bandit SAST

Languages & FrameworksPython, Java, TypeScript, Django, React, SQL, Bash

ToolsGit, Teleport, Jira, Postman, Cloudflare, Claude Code, Cursor

Education

College of Engineering Chengannur

2017 – 2021

B.Tech (Hons.) in Electronics and Communication Engineering · CGPA: 8.1/10

Languages

English · German · Malayalam

Experience#

Skills#

Education#

Languages#

Experience

Skills

Education

Languages