Senior Software Engineer - Cloud Platform Infrastructure in United States at Jobgether
Explore Related Opportunities
Job Description
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Software Engineer - Cloud Platform Infrastructure in the United States.
This role sits at the core of a modern cloud-native platform powering next-generation AI and data retrieval systems. You will design, build, and operate the underlying infrastructure that enables scalable, reliable, and high-performance cloud services used in production environments worldwide. The position blends strong software engineering with deep platform and systems expertise, focusing on Kubernetes-based infrastructure, distributed systems, and cloud operations. You will work in a highly technical, remote-first engineering organization that values ownership, autonomy, and production excellence. Your work will directly influence system reliability, cost efficiency, and developer productivity across the platform. You will collaborate closely with cross-functional engineering teams to continuously evolve and harden cloud infrastructure. This is a hands-on role for engineers who enjoy building and running complex systems at scale.
- Design, implement, and operate core cloud infrastructure components supporting a large-scale distributed platform.
- Build and maintain Kubernetes clusters, including the development of custom operators and platform automation tools.
- Develop production-grade services and automation in Go and Python to improve infrastructure reliability and efficiency.
- Improve scalability, performance, and cost optimization across multi-cloud environments (AWS, GCP, Azure).
- Strengthen observability through metrics, logging, tracing, and monitoring systems to ensure platform reliability.
- Automate operational workflows and reduce manual intervention through engineering-driven solutions.
- Collaborate with platform, infrastructure, and product engineering teams to align system design and service integration.
- Participate in incident response, root cause analysis, and long-term reliability improvements.
- Drive continuous reduction of operational overhead (KTLO) through system improvements and automation.
- Contribute to architecture discussions and help evolve cloud-native platform standards and practices.
- 5–7+ years of experience in platform engineering, SRE, or infrastructure-focused software engineering roles.
- Strong programming skills in Go and Python, or deep expertise in one with willingness to work across both.
- Hands-on experience operating Kubernetes in production environments.
- Strong understanding of distributed systems, cloud-native architectures, and infrastructure design principles.
- Experience with major cloud providers such as AWS, GCP, or Azure.
- Proficiency with CI/CD pipelines, infrastructure-as-code, and automation tooling.
- Experience participating in on-call rotations and managing production incidents effectively.
- Strong ownership mindset with the ability to work independently in complex technical environments.
- Excellent communication skills and ability to collaborate across distributed engineering teams.
- Nice to have: experience with observability tools (Prometheus, Grafana, OpenTelemetry), Kubernetes operators, or service mesh technologies.
- Competitive compensation package with additional perks.
- Remote-first, flexible work environment with async-friendly culture.
- Comprehensive health, dental, and vision insurance (for U.S.-based employees).
- 401(k) retirement plan with employer matching.
- Flexible paid time off policy.
- Home office and equipment support (choose your own laptop setup).
- Opportunity to work on cutting-edge AI and cloud infrastructure technology.
- High ownership environment with meaningful technical impact.
- Strong engineering culture with emphasis on open-source and collaboration.
- Inclusive, international team environment focused on innovation and growth.