Site Reliability Engineer (AI/ML Expertise) - BTP
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.
SAP is seeking a Site Reliability Engineer (SRE) with AI/ML operational expertise, basic DevOps skills, and strong proficiency in Red Hat Linux and Shell/Bash scripting. This role focuses on reliability engineering, observability, automation, and operational excellence for SAP-centric platforms and AI-driven services. Experience with SAP Integration Suite, SAP API Management, or Google Apigee is essential.
What you’ll build:
- Reliability Engineering
- Define and implement SLIs, SLOs, SLAs for SAP applications.
- Apply error budgets and reliability principles to guide release decisions.
- Conduct capacity planning, performance tuning, and chaos engineering for resilience.
- Observability & Incident Management
- Build end-to-end observability: metrics, logs, traces, and distributed monitoring.
- Implement Dynatrace for application and infrastructure monitoring.
- Automate alerting, runbooks, and incident response workflows.
- Drive postmortems, root cause analysis, and continuous improvement.
- AI/ML Operations
- Operationalize ML models: deployment, monitoring, drift detection, rollback strategies.
- Ensure model reliability, fairness, and compliance in production environments.
- Automate model retraining pipelines and integrate with SAP BTP AI services.
- API Management & Integration
- Manage and secure APIs using SAP Integration Suite, SAP API Management, or Google Apigee.
- Implement API governance, traffic management, and monitoring for reliability.
- Collaborate with development teams to ensure API-first architecture and integration best practices.
- Basic DevOps
- Support CI/CD pipelines for SAP and non-SAP workloads.
- Implement Infrastructure as Code (IaC) for environment provisioning.
- Assist in containerization and Kubernetes operations.
- Security & Compliance
- Embed security controls into operational workflows (secrets management, vulnerability scanning).
- Ensure compliance with data privacy, auditability, and SAP security standards.
- Bachelor's or Master's in relevant field
- 5+ years of experience in SRE/Platform Reliability roles, including AI/ML operations experience.
- Hands-on experience with SAP Integration Suite, SAP API Management, or Google Apigee.
- Strong proficiency in Red Hat Linux and Shell/Bash scripting.
- Exposure to basic DevOps practices (CI/CD, IaC, container orchestration).
Core SRE Skills
- SLIs/SLOs, error budgets, chaos engineering.
- Observability: Dynatrace, Prometheus, Grafana, ELK/EFK, OpenTelemetry.
- Incident response: Knowledge on ticketing tools like Jira, ServiceNow. Ability to investigate and prepare CF RCAs.
- Familiarity with multiple cloud platforms such as AWS, Azure, Google Cloud Platform. Understand their respective offerings, services, and best practice.
API Management
- SAP Integration Suite, SAP API Management, or Google Apigee.
- API security, throttling, analytics, and governance.
Automation & Infrastructure
- Kubernetes, Helm, Terraform, Argo CD/Flux.
- Self-healing and auto-remediation strategies.
AI/ML Operations
- MLflow, Kubeflow, Airflow, model monitoring tools (Evidently AI, WhyLabs).
- Feature stores, model registries, inference serving (Triton, Seldon).
Basic DevOps
- CI/CD tools: Azure DevOps, GitHub Actions, Jenkins.
- Containers: Docker, Kubernetes.
SAP Exposure
- SAP BTP (Neo, Cloud Foundry), SAP AI Core, SAP HANA ML.
Programming
- Python (for ML workflows), Shell/Bash scripting, Go/Java for reliability tooling.
- SLO attainment, MTTR reduction, incident frequency.
- API reliability and performance metrics.
- Model reliability metrics (drift detection time, rollback success).
- Opportunity to shape reliability engineering for AI-driven SAP solutions.
- Work on API management and observability in hybrid cloud environments.
- Collaborative culture focused on operational excellence and innovation.
#DevOpsT2 #SAPInternalT2
Bring out your best
SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.
We win with inclusion
SAP’s culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone – regardless of background – feels included and can run at their best. At SAP, we believe we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better world.
SAP is committed to the values of Equal Employment Opportunity and provides accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to Recruiting Operations Team: Careers@sap.com.
For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program, according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.
Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability, in compliance with applicable federal, state, and local legal requirements.
Successful candidates might be required to undergo a background verification with an external vendor.
AI Usage in the Recruitment Process
For information on the responsible use of AI in our recruitment process, please refer to our Guidelines for Ethical Usage of AI in the Recruiting Process.
Please note that any violation of these guidelines may result in disqualification from the hiring process.
Requisition ID: 443040 | Work Area: Software-Development Operations | Expected Travel: 0 - 10% | Career Status: Professional | Employment Type: Regular Full Time | Additional Locations: #LI-Hybrid
Bangalore, IN, 560066
Job Segment:
Open Source, Cloud, ERP, Manufacturing Engineer, Linux, Technology, Engineering