Senior Infrastructure Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Infrastructure Engineer in United States.
This role sits at the core of a modern AI platform that powers large-scale machine learning workflows across industries such as autonomous vehicles, healthcare, and agriculture. You will be responsible for designing and evolving the infrastructure that supports both individual researchers and enterprise-grade deployments of a mission-critical data platform. Working in a fully remote, distributed environment, you will shape systems that handle unstructured data at scale while ensuring reliability, security, and performance. The role blends deep technical ownership with customer-facing responsibilities, particularly supporting enterprise deployments in production environments. You will collaborate closely with engineering teams to improve developer productivity and infrastructure maturity. This is a high-impact position where your work directly influences the scalability and trustworthiness of AI systems used by global organizations.
- Architect and evolve scalable infrastructure supporting deployments from research environments to large enterprise systems, ensuring performance, security, and reliability across cloud and on-premise setups.
- Design, build, and maintain containerized systems and deployment pipelines using Kubernetes, Docker, and Helm across multiple environments.
- Develop and optimize CI/CD pipelines (e.g., GitHub Actions, Cloud Build), improving automation, deployment speed, and system consistency.
- Partner with enterprise customers to support production deployments, including installation, troubleshooting, scaling, and ongoing operational support.
- Drive infrastructure initiatives across engineering teams, building internal tooling that enhances developer productivity and release efficiency.
- Implement observability and monitoring systems, including logging, tracing, alerting, and predictive failure prevention strategies.
- Lead infrastructure reliability improvements by troubleshooting complex distributed systems and preventing operational issues before they occur.
- Mentor engineers and help define infrastructure best practices across the organization.
- Strong experience working with containerized systems, including Docker and Kubernetes, as well as deployment tooling such as Helm.
- Proficiency in infrastructure-as-code tools (Terraform, Ansible, or equivalent) and automation scripting (Bash, Python).
- Deep understanding of CI/CD systems, ideally with hands-on experience using GitHub Actions.
- Solid cloud infrastructure expertise, particularly in GCP, including networking, IAM, load balancing, and security configurations.
- Experience supporting distributed systems and troubleshooting complex production issues, ideally in customer-facing contexts.
- Knowledge of observability practices (monitoring, logging, tracing) and system reliability engineering principles.
- Familiarity with security best practices such as least privilege access, service accounts, and certificate management.
- Strong communication skills with the ability to collaborate across engineering teams and engage directly with enterprise customers.
- Curiosity, adaptability, and ability to quickly ramp on new technologies and environments.
- Competitive compensation package with a base salary range of $200,000 – $240,000 USD
- Equity in the form of stock options
- Fully remote work within the United States
- Flexible distributed team environment with autonomy and ownership
- Equity and long-term incentive opportunities
- Comprehensive benefits package (details provided during hiring process)
- Opportunities to work on cutting-edge AI infrastructure used by global enterprises
- Professional growth in a highly technical, collaborative engineering culture
- Participation in in-person company retreats (at least twice per year)