Sugerencias de búsqueda:

sin experiencia
atención al cliente
operario
administrativa
recepcionista
part time
remoto sin experiencia
community manager
cajera atención al cliente
remoto
limpieza
desarrollador
16 años
Buenos Aires
Capital Federal
Córdoba
Rosario
Argentina
Los Polvorines
Provincia de Buenos Aires
Quilmes
El Talar de Pacheco
Provincia de Córdoba
Longchamps
Pilar
Morón
Hurlingham
La Plata
Postular

Lead Site Reliability Engineer

EPAM Systems
hace 3 días

Join our team as a Lead Site Reliability Engineer dedicated to providing advanced support for critical Azure-based systems.

You will address complex cloud challenges, enhance system observability, and strengthen reliability using Kubernetes, monitoring platforms, and Infrastructure-as-Code. If cloud reliability excites you and collaboration across teams inspires you, apply now to contribute to our innovative projects.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Responsibilities

  • Resolve complex incidents to ensure system availability
  • Maintain reliability and performance of Azure-based enterprise infrastructure
  • Deploy observability, monitoring, and logging tools
  • Automate infrastructure management with Terraform and scripting technologies
  • Improve system performance and uptime through centralized monitoring
  • Collaborate with multiple teams to enhance service reliability
  • Perform root cause analysis and oversee postmortems for incidents
  • Configure deployment pipelines in Azure DevOps for secure workflows
  • Write and maintain automation scripts for incident recovery and recurring tasks
  • Enhance monitoring frameworks with platforms like Prometheus and Grafana
  • Respond promptly to incidents to meet SLA expectations
  • Facilitate integration of monitoring data from Azure and AWS environments
  • Advance service reliability and observability practices continuously
  • Document processes and incident resolutions thoroughly
  • Take part in Agile team events and balance task priorities

Requirements

  • Minimum 5 years’ expertise in site reliability engineering or comparable DevOps roles
  • 1+ years of demonstrated leadership experience
  • Knowledge of Azure services, including AKS, Azure Monitor, Application Insights, Log Analytics, Cosmos DB, and PostgreSQL
  • Expertise in infrastructure automation using Azure DevOps and Terraform
  • Proficiency in scripting languages such as Bash, PowerShell, and Python
  • Skills in monitoring tools including Prometheus and Grafana
  • Background in incident management and ITSM processes with analytical capability for root cause investigations
  • Competency in resolving technical challenges promptly in high-pressure situations
  • Experience in Agile workflows and fast-paced operational environments
  • Flexibility to communicate effectively in written and verbal formats for teamwork and documentation
  • Capability to configure alerts that prevent SLA breaches proactively
  • Understanding of cloud scaling techniques and security best practices
  • Knowledge of Kubernetes administration for orchestration tasks
  • Ability to collaborate with diverse functional teams seamlessly
  • English proficiency of B2 or higher

Nice to have

  • Background in AWS services, such as EKS, RDS, CloudWatch, and X-Ray
  • Familiarity with distributed logging systems and tools for incident automation
  • Certifications such as Microsoft Azure Administrator or AWS Certified DevOps Engineer
  • Understanding of Kubernetes configurations for scaling and advanced networking setups
  • Proficiency in observability tools such as OpenSearch for AWS environments

We offer

  • Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept).
  • Medicina Prepaga (It covers the collaborator and direct family group).
  • Paternity Leave (Two additional days are added to what is established by law, total of 4 days).
  • Discounts card.
  • English Training (English lessons, twice per week).
  • Training Program (Access to multiple customized training plans according to the needs of each role within the company).
  • Marriage bonus (The company doubles the allowance established by law that ANSES offers).
  • Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company).
  • External Agreements and Discounts.
  • Vacations: 14 calendar days a year

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

Guardar Postular
Reportar empleo
Otras recomendaciones de empleo:

Senior Site Reliability Engineer

EPAM Systems
  • Ensure reliability and performance of Azure-based...
  • Manage deployment pipelines in Azure DevOps for secure and...
hace 3 días

Site Reliability Engineer II

JPMorganChase
Argentina
  • As part of a global team, contribute to design, implement...
  • Collaborate with Development and Business Operation teams to...
hace 3 días

Senior Software Engineer, Canvas

MURAL
  • Product Engineering work for new features and for...
  • As a Sr Engineer you’ll collaborate on and lead projects...
hace 4 días

Sr Site Construction Director - Site Salta

Worley
Salta, Salta
Just as an example, but not limiting, the main activities are detailed: a. Lead an interdisciplinary team of personnel...
hace 4 días

Software Engineer - Scraping

REPS & Co.
Argentina
  • Willingness to collaborate with and mentor your teammates to...
  • 5 – 8+ years working as a full-stack software engineer using...
hace 3 días