We are seeking a Senior ML Infrastructure Engineer to bolster our MLOps team, overseeing the development and maintenance of our enterprise machine learning platform while driving innovation in scalable ML infrastructure and deployment practices.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Responsibilities
- Provide expert guidance on ML technologies, tools, and MLOps best practices focused on model observability, tracking, and deployment
- Build and maintain robust batch processing and ML inference pipelines to enable efficient model execution
- Automate ML model deployment processes with CI/CD pipelines to streamline production workflows
- Monitor the health, performance, reliability, and scalability of deployed models and infrastructure
- Integrate ML inference services seamlessly with other applications or systems
- Enable scalable, high-performance deployments of ML models that perform well under production load
- Collaborate directly with client stakeholders and team members to ensure requirements are met and tasks are completed effectively
- Implement infrastructure solutions that support data processing pipelines and batch inferencing
- Create comprehensive unit tests for ML deployment, inference, and post-processing methods
- Maintain clear and proactive communication with team members and stakeholders to ensure alignment
Requirements
- 3+ years of experience with AWS services and MLOps-related infrastructure, focusing on scalable ML model deployment
- Expertise in infrastructure-as-code tools, enabling efficient and consistent infrastructure setup
- Strong background in setting up and monitoring infrastructure for data pipelines and ML inference pipelines
- Demonstrated task ownership abilities, with experience working directly with client stakeholders and cross-functional teams
- Skills in writing unit tests for ML deployment, inference, and related methods to ensure code reliability
- Clear and effective communication skills with the ability to seek clarifications when needed
Nice to have
- Experience with Google Cloud Platform (GCP) and its ML-related services
- Competency in working with Snowflake as a data platform for ML workflows
- Familiarity with Feature Store platforms to improve feature management
- Background in using Spark and AWS Elastic MapReduce (EMR) for distributed data processing
- Understanding of data curation best practices for ML model training and enabling high-quality datasets
- Flexibility to participate in on-call rotations, ensuring system reliability in production environments
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept).
- Medicina Prepaga (It covers the collaborator and direct family group).
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days).
- Discounts card.
- English Training (English lessons, twice per week).
- Training Program (Access to multiple customized training plans according to the needs of each role within the company).
- Marriage bonus (The company doubles the allowance established by law that ANSES offers).
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company).
- External Agreements and Discounts.
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.