Lead Database Reliability Engineer in United States at Jobgether
Explore Related Opportunities
Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Lead Database Reliability Engineer based in United States.
This is a senior technical leadership role focused on building, scaling, and maintaining highly reliable database systems that power mission-critical applications. You will operate at the intersection of database engineering, cloud infrastructure, and platform reliability, ensuring systems remain secure, performant, and highly available at scale. The role combines deep hands-on engineering with architectural ownership, automation, and operational excellence. You will lead initiatives to improve observability, resilience, and performance across large-scale cloud database environments. Acting as both a technical authority and mentor, you will guide engineering best practices while partnering closely with development and infrastructure teams. This is a high-impact position where your work directly influences system stability, efficiency, and long-term platform scalability.
- Lead the design and evolution of scalable, high-performance database architectures supporting mission-critical applications and long-term platform strategy.
- Build and maintain automation, monitoring, alerting, backup, and disaster recovery systems to ensure database reliability, availability, and integrity.
- Drive database performance optimization initiatives, including query tuning, capacity planning, and system-level troubleshooting across production environments.
- Develop and enhance observability frameworks using modern monitoring tools to proactively detect anomalies and improve operational visibility.
- Leverage cloud infrastructure (primarily AWS) and database services to manage and optimize large-scale distributed database environments.
- Ensure compliance with security, regulatory, and internal governance standards through regular audits and operational reviews.
- Provide technical leadership during incident response and on-call rotations, resolving complex database issues in high-pressure environments.
- Mentor and support junior engineers while collaborating across development, DevOps, and infrastructure teams.
- 8+ years of experience in database administration, reliability engineering, or data platform engineering roles in production-scale environments.
- Deep expertise in MySQL, including performance tuning, replication, backup/recovery, security, and high-availability architectures.
- Strong experience with cloud platforms, particularly AWS, including services such as RDS and Aurora; Azure or GCP experience is a plus.
- Proficiency in scripting and automation using languages such as Python, Bash, or Ruby.
- Experience building and maintaining database observability and monitoring solutions using tools like PMM, New Relic, VividCortex, or similar.
- Strong understanding of high availability, disaster recovery strategies, and system resilience engineering.
- Experience with infrastructure-as-code and DevOps tools such as Terraform, GitHub workflows, or configuration management tools is highly valued.
- Familiarity with additional databases such as PostgreSQL or MongoDB is a plus.
- Strong communication skills with the ability to collaborate across global, cross-functional engineering teams.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field, or equivalent practical experience.
- Competitive compensation aligned with senior-level engineering market standards, including base salary and potential performance-based adjustments.
- Remote-first flexibility within the United States.
- Opportunity to work on large-scale, high-impact database systems supporting global enterprise platforms.
- Exposure to advanced cloud-native architectures and modern reliability engineering practices.
- Collaborative, engineering-driven culture with strong emphasis on innovation and technical excellence.
- Leadership opportunities, including mentoring and technical ownership across critical systems.
- Inclusive work environment with a strong focus on professional growth and continuous learning.