JobTarget Logo

Technology Operations Manager in United States at Jobgether

NewJob Function: Information Technology
Jobgether
United States, United States
Posted on
New job! Apply early to increase your chances of getting hired.

Explore Related Opportunities

Job Description

Technology Operations Manager

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal MLOps Platform Engineer in the United States.

This role sits at the center of building and operating a next-generation AI and MLOps platform designed to support production-grade machine learning and agentic systems at scale. You will be responsible for designing the infrastructure backbone that enables model deployment, observability, orchestration, and cost-efficient runtime operations across cloud environments. The position combines deep cloud engineering, platform architecture, and MLOps expertise, with a strong focus on reliability and automation. You will define how models and LLM-powered services are deployed, monitored, and governed in production. Working across engineering, data, and AI teams, you will ensure seamless integration of ML workflows into scalable, secure, and observable systems. This is a high-impact role where your work directly shapes platform performance, developer experience, and operational efficiency. You will also help establish best practices for cost control, environment management, and production readiness of AI systems.

Accountabilities

In this role, you will be responsible for designing, building, and operating the core MLOps platform infrastructure that supports deployment, observability, and lifecycle management of AI and ML systems.

  • Build and maintain infrastructure as code using Terraform or AWS CDK to support scalable ML platform environments
  • Design and implement CI/CD pipelines using tools such as GitHub Actions, GitLab CI, or AWS CodePipeline
  • Establish observability frameworks for ML and LLM systems using CloudWatch, OpenTelemetry, and related tools
  • Manage containerized workloads using Docker and orchestration platforms such as ECS Fargate or EKS
  • Define and enforce environment isolation strategies, model versioning, and prompt lifecycle management
  • Implement monitoring and cost governance mechanisms, including budgets and usage tracking via CloudWatch
  • Ensure reliability, scalability, and performance of ML runtime infrastructure across production environments
  • Collaborate with AI, data, and engineering teams to integrate ML workflows into platform architecture
  • Continuously improve automation, deployment efficiency, and platform developer experience
  • Support best practices for secure, compliant, and cost-effective ML operations
Requirements

The ideal candidate is a highly skilled cloud and platform engineer with strong MLOps experience, deep AWS expertise, and a strong focus on reliability, observability, and scalable infrastructure design.

  • 7+ years of experience in platform engineering, DevOps, MLOps, or cloud infrastructure roles
  • Deep expertise in AWS, including production-grade architecture and operational management
  • Strong experience building infrastructure as code using Terraform or AWS CDK
  • Hands-on experience with CI/CD pipelines and modern deployment workflows
  • Proven experience with containerization and orchestration (Docker, ECS, EKS, or Kubernetes)
  • Strong understanding of observability practices using tools such as CloudWatch and OpenTelemetry
  • Experience managing ML or LLM workloads in production environments is highly desirable
  • Strong focus on reliability, scalability, security, and cost optimization
  • Experience with environment isolation, versioning, and model lifecycle management
  • Strong analytical and problem-solving skills in complex distributed systems
  • AWS certifications (Solutions Architect Associate or Professional) are preferred
  • Kubernetes or CNCF certifications are a plus
  • Bachelor’s degree in Computer Science, Information Systems, or related field preferred
Benefits

This position offers a competitive compensation package along with strong benefits and the opportunity to work on cutting-edge AI infrastructure.

  • Salary range: $170,000 – $190,000 annually (OTE, including base and bonus where applicable)
  • Comprehensive medical, dental, and vision insurance
  • 401(k) retirement savings plan
  • Paid time off and company holidays
  • Paid parental and caregiver leave
  • Remote-friendly work environment (where applicable)
  • Access to advanced technology environments and internal engineering labs
  • Continuous learning support, including certifications and training opportunities
  • Strong culture of inclusion, collaboration, and innovation
  • Opportunity to build and scale production AI/ML platforms at enterprise level
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1

Job Location

United States, United States

Frequently asked questions about this position

Continue to apply
Enter your email to continue. You’ll be redirected to the employer’s application.
By clicking Continue, you understand and agree to JobTarget's Terms of Use and Privacy Policy.