Senior Linux & Cloud Administrator - Root Cause Analysis (RCA) Expert (f/m/d)(412876)
We help the world run better
At SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from.
We are seeking a highly skilled and experienced Senior Linux Infrastructure Engineer with a focus on Root Cause Analysis (RCA) to join our team. The ideal candidate will possess an extensive technical background, superior problem-solving skills, and a passion for ensuring the robustness and resilience of our Linux server infrastructure. You must feel comfortable working in a fast-paced, dynamic, and flexible environment and operate effectively in a global 24x7 international setting.
What you'll do
• Perform thorough Root Cause Analysis (RCA) to identify, analyze, and resolve complex issues within Linux server infrastructure.
• Monitor, troubleshoot, and optimize the performance of Linux-based systems.
• Collaborate with cross-functional teams to gather data, replicate issues, and implement solutions.
• Create comprehensive RCA reports, system documentation, and knowledge base articles.
• Implement automation through scripting and configuration management tools to streamline diagnostic processes.
• Maintain security, compliance, and OS hardening across the infrastructure.
• Stay current with industry trends, technologies, and best practices to continuously improve systems and processes.
• Provide mentorship and detailed documentation to assist junior colleagues in implementing technical plans and adhering to best practices.
What you bring
• 10+ years of related professional experience with a focus on system diagnostics and Root Cause Analysis (RCA).
Technical Skills
• Linux Systems: In-depth knowledge of Linux system internals, kernel architecture, process and memory management, filesystems, and system calls.
• Monitoring Tools: Proficiency with tools such as top, htop, vmstat, iostat, sar, ps, netstat, ss, etc.
• Logs and Tracing: Experience with journalctl, rsyslog, syslog-ng, dmesg, strace, lsof, etc.
• Networking: Advanced understanding of TCP/IP, network interfaces, routing, DNS, DHCP, firewalls, and diagnostic tools like ping, traceroute, tcpdump, wireshark, iftop, netcat, nmap, etc.
• Performance Analysis: Proficiency with tools like perf, systemd-analyze, iotop, blktrace, ioping, and benchmarks.
• Security Incident Management: Knowledge of security principles, OS hardening, compliance, and tools for vulnerability scanning and intrusion detection.
• Scripting and Automation: Strong knowledge of Shell scripting, Python, Perl, or other scripting languages, and Infrastructure-as-Code tools like Ansible, Puppet, Chef, or Terraform.
• Cloud Infrastructure: Experience with AWS, Azure, GCP, including services such as EC2, S3, IAM, VPC, security groups, and load balancers.
• Virtualization Technologies: Familiarity with Docker, Kubernetes, VMware, KVM, and other virtualization or containerization technologies.
Soft Skills:
• Analytical and Problem-Solving: Strong ability to analyze issues, identify root causes, and implement effective solutions systematically.
• Documentation: Ability to create clear and detailed RCA reports and technical documentation.
• Communication: Excellent communication and networking skills, with the ability to articulate findings and solutions to technical and non-technical stakeholders.
• Incident Management: Experience with ITIL or similar frameworks for incident management.
• Continuous Learning: Proactive in acquiring new knowledge and staying updated with the latest trends and technologies.
Language Skills:
• Fluency in English, with excellent communication skills tailored towards explaining complex RCA findings.
Tools and Technologies:
• Monitoring Tools: Prometheus, Grafana.
• Log Management: Splunk.
• Diagnostic Tools: top, htop, vmstat, iostat, sar, ps, netstat, ss, tcpdump, wireshark, strace, lsof.
Meet your team
SAP Enterprise Cloud Services (ECS) organisation run and operate SAP’s private cloud offering which are built for customers in various hyperscalers like Microsoft Azure, Google Cloud Platform or Amazon web services.
As part of the ECS Delivery Technical Operations, the Server Management team is responsible for 24x7 operations of these business-critical SAP systems in the cloud at the maximum efficiency. Team provides stable support for the OS and Infrastructure hosted in the hyperscalers or SAP DC. Proactive support is provided to maintain system availability and ensure smooth operations. Apart from the day to day operations we also have strategic projects for newer technologies and enhancements.
Bring out your best
SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.
We win with inclusion
SAP’s culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone – regardless of background – feels included and can run at their best. At SAP, we believe we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better and more equitable world.
SAP is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to the values of Equal Employment Opportunity and provide accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to Recruiting Operations Team: Careers@sap.com.
For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program, according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.
Requisition ID: 412876 | Work Area: Information Technology | Expected Travel: 0 - 10% | Career Status: Berufserfahren | Employment Type: Vollzeit, unbefristet or Part-time | Locations: St. Leon Rot or Dresden #LI-Hybrid
St. Leon-Rot, DE, 68789