Sugerencias de búsqueda:

sin experiencia
atención al cliente
operario
administrativa
maestranza
recepcionista
part time
remoto
seguridad privada
zona sur
limpieza
ayudante de cocina
asistente
Buenos Aires
Capital Federal
Córdoba
Rosario
Argentina
Tortuguitas
Quilmes
Belgrano
Provincia de Buenos Aires
General Pacheco
Pilar
Temperley
San Luis
General San Martín
La Plata
Postular

Senior Site Reliability Engineer

EPAM Systems
hace 2 días

Join our team as a Senior Site Reliability Engineer focused on delivering advanced support for critical Azure-based systems.

You will troubleshoot complex cloud environments, enhance observability, and implement reliability solutions using Kubernetes, monitoring tools, and Infrastructure-as-Code. If you are passionate about cloud reliability and enjoy collaborating across teams, apply now to contribute to our cutting-edge projects.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Responsibilities

  • Troubleshoot and resolve complex incidents to maintain system uptime
  • Ensure reliability and performance of Azure-based enterprise infrastructure
  • Implement observability, monitoring, and logging solutions
  • Automate infrastructure provisioning and deployment using Terraform and scripting
  • Optimize system performance and uptime through proactive monitoring and alerting
  • Collaborate with cross-functional teams to improve service reliability
  • Conduct root cause analysis and postmortems for incident management
  • Manage deployment pipelines in Azure DevOps for secure and scalable workflows
  • Develop and maintain automation scripts for routine tasks and incident recovery
  • Enhance monitoring frameworks with tools like Prometheus and Grafana
  • React quickly to incidents to avoid SLA degradation
  • Integrate monitoring data from Azure and AWS environments
  • Support continuous improvement of service reliability and observability practices
  • Document technical processes and incident reports
  • Participate in Agile team activities and prioritize competing tasks

Requirements

  • Minimum 3 years of experience in site reliability engineering or related DevOps roles
  • Hands-on experience with Azure services, including AKS, Azure Monitor, Application Insights, Log Analytics, Cosmos DB, and PostgreSQL
  • Strong expertise in Azure DevOps and Terraform for infrastructure automation
  • Proficient scripting skills in Bash, PowerShell, and Python
  • Experience with monitoring and observability tools such as Prometheus and Grafana
  • Solid background in incident management and ITSM processes with root cause analysis capabilities
  • Ability to troubleshoot and debug complex technical issues in real-time
  • Experience working in fast-paced Agile environments
  • Strong verbal and written communication skills for collaboration and reporting
  • Proactive approach to setting alerts and preventing SLA degradation
  • Experience with cloud infrastructure scaling and security best practices
  • Knowledge of Kubernetes administration and orchestration
  • Ability to collaborate effectively with cross-functional teams
  • English language proficiency at B2 level or above

Nice to have

  • Hands-on experience with AWS services including EKS, RDS, CloudWatch, and X-Ray
  • Familiarity with distributed logging pipelines and incident automation tools
  • Knowledge of advanced Kubernetes use cases for scaling and network configurations
  • Certifications such as Microsoft Azure Administrator or AWS Certified DevOps Engineer
  • Experience with observability tools like OpenSearch for AWS workloads

We offer

  • Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept).
  • Medicina Prepaga (It covers the collaborator and direct family group).
  • Paternity Leave (Two additional days are added to what is established by law, total of 4 days).
  • Discounts card.
  • English Training (English lessons, twice per week).
  • Training Program (Access to multiple customized training plans according to the needs of each role within the company).
  • Marriage bonus (The company doubles the allowance established by law that ANSES offers).
  • Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company).
  • External Agreements and Discounts.
  • Vacations: 14 calendar days a year

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

Guardar Postular
Reportar empleo
Otras recomendaciones de empleo:

Lead Site Reliability Engineer

EPAM Systems
  • Enhance monitoring frameworks with platforms like...
  • Facilitate integration of monitoring data from Azure and...
hace 2 días

Site Reliability Engineer II

JPMorganChase
Argentina
  • As part of a global team, contribute to design, implement...
  • Collaborate with Development and Business Operation teams to...
hace 2 días

Senior Software Engineer, Canvas

MURAL
  • Product Engineering work for new features and for...
  • As a Sr Engineer you’ll collaborate on and lead projects...
hace 3 días

Software Engineer - Scraping

REPS & Co.
Argentina
  • Willingness to collaborate with and mentor your teammates to...
  • 5 – 8+ years working as a full-stack software engineer using...
hace 2 días

System Engineer – Package Management

Techunting
Cordoba, Córdoba
Request the required OTA file from stakeholders and create a repository to store all the files. As a Systems Engineer you will...
hace 2 días