Job Description:
Job Title: Senior Site Reliability Engineer
Corporate Title: AVP
Location: Bangalore, India
Role Description
- We are seeking a Site Reliability Engineer for Observability platforms in the Bank to enhance, scale, and modernise our enterprise observability capability.
- This role focuses on owning and evolving Observability and Monitoring tools across the Bank, driving a shift towards OpenTelemetry (OTel)-based telemetry standardisation.
- The successful candidate will contribute to automation, AI adoption, and observability-by-design practices to improve reliability, scalability, and developer experience.
What we’ll offer you
As part of our flexible scheme, here are just some of the benefits that you’ll enjoy,
- Best in class leave policy.
- Gender neutral parental leaves
- 100% reimbursement under childcare assistance benefit (gender neutral)
- Sponsorship for Industry relevant certifications and education
- Employee Assistance Program for you and your family members
- Comprehensive Hospitalization Insurance for you and your dependents
- Accident and Term life Insurance
- Complementary Health screening for 35 yrs. and above
Your key responsibilities
Tools Reliability Governance:
- Own the availability, performance, and resilience of the Observability tool stack in the Bank
- Act as admin of the tool stack, ensuring platforms effectively support enterprise monitoring requirements
- Drive standardisation of telemetry using OpenTelemetry (OTel) across Metrics, Events, Logs, and Traces (MELT)
- Define and implement telemetry collection, enrichment, and routing strategies using OTel collectors and pipelines
- Identify and implement automation and self-healing for common issues and adopt AI practices to enhance tools availability and user experience
Own Incident and Problem Management framework (severity, escalation, response and resolution):
- Ensure quick incident response, containment, and service restoration
- Perform deep root cause analysis and deliver permanent resolutions