We are seeking a Lead ML Infrastructure Engineer to strengthen our MLOps team, focusing on the design and management of our enterprise machine learning platform while advancing scalable ML infrastructure and deployment practices.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Responsibilities

Provide expert advice on ML technologies, tools, and MLOps best practices with an emphasis on model observability, tracking, and deployment
Design and maintain robust batch processing and ML inference pipelines for efficient model execution
Automate ML model deployment processes through CI/CD pipelines to enhance production workflows
Monitor deployed models and infrastructure for health, performance, reliability, and scalability
Ensure seamless integration of ML inference services with other applications or systems
Enable deployments of ML models that scale efficiently and maintain high performance in production environments
Collaborate with client stakeholders and team members to ensure requirements are understood and tasks are completed effectively
Develop infrastructure solutions that support both data processing pipelines and batch inferencing capabilities
Write comprehensive unit tests to ensure reliability for ML deployment, inference, and post-processing methods
Maintain proactive and transparent communication with team members and stakeholders to ensure alignment

Requirements

5+ years of experience with AWS services and MLOps-focused infrastructure for scalable ML model deployment
Expertise in infrastructure-as-code tools, enabling efficient and consistent infrastructure provisioning
Strong background in setting up and monitoring infrastructure for data and ML inference pipelines
Demonstrated ability to take ownership of tasks and work collaboratively with client stakeholders and teams
Skills in writing effective unit tests for ML deployment, inference, and related methods
Proficiency in clear communication with the ability to ask for clarification when necessary

Nice to have

Knowledge of Google Cloud Platform (GCP) and its ML-specific services
Proficiency in using Snowflake as a data platform for ML workflows
Understanding of Feature Store platforms to enhance feature management processes
Background in Spark and AWS Elastic MapReduce (EMR) for processing distributed datasets
Familiarity with data curation best practices to support ML model training and high-quality dataset creation
Capability to participate in on-call rotations to maintain system reliability in production environments

We offer

Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept).
Medicina Prepaga (It covers the collaborator and direct family group).
Paternity Leave (Two additional days are added to what is established by law, total of 4 days).
Discounts card.
English Training (English lessons, twice per week).
Training Program (Access to multiple customized training plans according to the needs of each role within the company).
Marriage bonus (The company doubles the allowance established by law that ANSES offers).
Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company).
External Agreements and Discounts.
Vacations: 14 calendar days a year

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

Guardar Postular

Reportar empleo

Lead ML Infrastructure Engineer

Lead Voice Infrastructure and Automation Engineer

Senior Software Engineer, Canvas

Mainframe Storage Infrastructure Engineer

Senior ML Infrastructure Engineer

782 - Cloud Database Engineer Sr · ARG