Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Infrastructure Operations Engineer at Jobgether?

The Infrastructure Operations Engineer position at Jobgether is a Full-time or part-time position opportunity in the Admin/Clerical/Secretarial field.

Where is this Infrastructure Operations Engineer job located?

United States, Other / Non-US, United States

What type of employment is offered for this Infrastructure Operations Engineer role?

Full-time or part-time position

What is the expected salary for this Infrastructure Operations Engineer job?

Compensation will be discussed during the hiring process.

Infrastructure Operations Engineer job near me in United States, Other / Non-US at Jobgether

Infrastructure Operations Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an Infrastructure Operations Engineer in the United States.

In this role, you will help operate and scale large-scale AI and GPU infrastructure that powers next-generation machine learning workloads across research, startup, and enterprise environments. You will work at the intersection of reliability engineering, cloud operations, and automation, ensuring that complex distributed systems remain performant, observable, and resilient. This position offers hands-on exposure to bare metal infrastructure, Kubernetes environments, and cloud platforms, with a strong emphasis on operational excellence and automation. You will collaborate closely with infrastructure engineers, network specialists, and software teams to resolve incidents, improve system reliability, and reduce operational friction. Operating in a fast-moving environment, you will contribute directly to platform stability and customer success. This is a highly technical and impactful role for engineers who thrive in complex infrastructure ecosystems and enjoy building scalable operational systems.

Accountabilities:

In this role, you will be responsible for ensuring the reliability, scalability, and efficiency of large-scale infrastructure systems supporting GPU and cloud-based workloads.

Operate, monitor, and maintain large-scale Linux-based and GPU-enabled infrastructure environments
Support provisioning, deployment, and lifecycle management of compute and storage systems
Build automation and tooling to reduce operational overhead and improve system reliability
Manage and optimize cloud infrastructure components across AWS and hybrid environments
Work with Kubernetes clusters and containerized workloads to ensure system stability and performance
Support incident response, troubleshooting, and root cause analysis in production environments
Implement and improve observability solutions using monitoring and logging tools such as Prometheus and ELK
Collaborate with engineering and network teams to improve infrastructure design and operational workflows
Participate in on-call rotations and ensure timely resolution of production issues
Contribute to infrastructure improvements, including GitOps workflows and configuration management

Requirements:

This role requires strong infrastructure engineering experience with deep expertise in systems operations, cloud platforms, and automation.

8+ years of experience working with Linux systems in production environments
5+ years of experience with AWS infrastructure and cloud services
2+ years of experience with Kubernetes and containerized workloads
Hands-on experience with Terraform and Ansible for infrastructure as code
Experience managing network-attached storage systems (e.g., NFS, Ceph, or similar)
Strong understanding of monitoring and observability tools such as Prometheus and ELK stack
Familiarity with GitOps workflows and modern infrastructure automation practices
Programming or scripting experience in Python, Go, Bash, or similar languages for automation
Strong networking fundamentals, including understanding of distributed systems and datacenter environments
Experience working with bare metal systems, GPU infrastructure, or large-scale compute environments is highly valued
Strong problem-solving skills and ability to operate effectively in ambiguous, fast-changing environments
Excellent communication skills and ability to collaborate across technical teams

Benefits:

Competitive salary ($160,000–$200,000 USD base range) plus equity and potential bonus
Fully flexible work environment (remote or hybrid within the United States)
Comprehensive medical, dental, and vision coverage (U.S. employees)
Retirement and financial wellness programs
Generous paid time off and company holidays
Paid parental leave
Professional development and learning support
Wellness, home-office, and work-from-home stipends
Opportunity to work on cutting-edge AI and GPU infrastructure at scale

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Infrastructure Operations Engineer in United States at Jobgether

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position