Staff Software Engineer - Grafana Databases, Managed Services at Jobgether – UK
Explore Related Opportunities
About This Position
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Software Engineer – Grafana Databases, Managed Services in the United Kingdom.
In this role, you will operate at the intersection of large-scale distributed systems, streaming infrastructure, and cloud database platforms, helping power mission-critical observability services used globally. You will be responsible for the reliability, scalability, and performance of multi-cloud infrastructure that underpins high-throughput metrics, logs, and traces systems. Working in a deeply technical, remote-first engineering environment, you will influence architecture decisions while remaining hands-on in production systems. Your work will directly impact the stability and efficiency of large-scale data pipelines operating across hundreds of clusters. This is a high-autonomy role where you will partner with platform and database teams to solve complex distributed systems challenges. You will also play a key role in shaping operational excellence, reliability practices, and long-term system evolution across global infrastructure.
In this role, you will take ownership of large-scale streaming and database infrastructure, ensuring reliability, scalability, and performance across hundreds of production clusters while driving architectural improvements and operational excellence.
- Operate and evolve large-scale multi-cloud streaming and database infrastructure across production environments
- Diagnose and resolve complex cross-layer failures involving storage, compute, networking, and control-plane systems
- Design and implement safe rollout, upgrade, and migration strategies across distributed systems at scale
- Improve observability, automation, and operational tooling to reduce system toil and increase reliability
- Define and evolve SLOs, error budgets, and reliability standards for shared infrastructure systems
- Partner with engineering teams to optimize query performance, data partitioning, and system scalability
- Serve as a primary escalation point for high-severity incidents and lead deep root cause analysis efforts
- Drive long-term architectural improvements to reduce systemic risks across multi-cluster environments
- Mentor engineers and contribute to best practices in distributed systems engineering and operational excellence
You bring deep expertise in distributed systems, infrastructure engineering, or platform engineering, with strong experience operating high-scale production systems in cloud environments. You are highly technical, autonomous, and comfortable leading complex initiatives across global teams.
- 8+ years of software engineering experience in SRE, platform engineering, infrastructure, or distributed systems roles
- Strong experience with large-scale streaming or database systems (e.g., Kafka, Redpanda, ClickHouse, Cassandra, or similar)
- Hands-on expertise with Kubernetes in AWS, GCP, or Azure environments
- Proficiency in infrastructure-as-code tools such as Terraform, Helm, or similar
- Strong programming skills in systems-oriented languages (Go preferred)
- Deep understanding of distributed systems behavior, failure modes, and performance trade-offs
- Experience with observability, incident response, and writing post-incident reviews
- Strong knowledge of Linux internals, networking, storage systems, and cloud architecture
- Proven ability to lead technical initiatives and influence architectural decisions without formal authority
- Excellent communication skills with the ability to work effectively in remote, cross-functional teams
- Competitive compensation package including base salary, bonus (where applicable), and equity (RSUs)
- Fully remote-first working model with global collaboration across distributed teams
- 30 days annual leave, including designated shutdown days for full disconnection
- Equity ownership in the company’s long-term success through RSU participation
- Access to modern AI development tools with company-supported usage budgets
- Strong emphasis on autonomy, trust, and outcome-driven engineering culture
- Career growth opportunities in a fast-scaling global infrastructure organization
- Exposure to cutting-edge distributed systems and large-scale observability platforms
- Inclusive, transparent, and highly collaborative engineering environment