Lead DevOps Engineer in Bengaluru, Karnātaka at Kobie Marketing
Explore Related Opportunities
Job Description
About the team and what we will build together
We’re looking for a DevOps Engineer with 10+ years of experience who thrives on building reliable, secure and automated cloud platforms that power real business workflows. You have strong scripting skills (Python, Bash or Go), hands-on experience operating production environments (CI/CD, infrastructure-as-code, container orchestration, observability), and working knowledge of Terraform, Kubernetes and at least one major cloud (AWS, Azure or GCP). You are comfortable writing solid SQL when needed, automating with Git and CI tools like GitHub Actions or Jenkins, and shipping reliable infrastructure that other engineers love to use.
Kobie runs some of the largest loyalty programs in the world. We are building an internal agent platform on Dataiku that automates analyst workflows, surfaces insights from program data in Snowflake, and gives our teams an LLM-native way to work with complex loyalty logic. As a Lead DevOps Engineer, you will play a key role in forming the roadmap for Data and AI Teams and building/supporting that platform — provisioning infrastructure, owning CI/CD, hardening security, and ensuring our services and agents run reliably at scale. This is not a back-office role: you will design, ship, monitor and iterate on infrastructure used by real teams, working closely with our U.S. Engineering and AI & Innovation teams and cross-functional partners across Engineering, Data, QA and Product.
Strategy, Roadmap & Vision – build the roadmap for devops and observability in Data & AI teams.
Design and build cloud infrastructure as code with Terraform (or Pulumi / CloudFormation), packaging reusable modules for AWS, Azure or GCP
Own CI/CD pipelines in GitHub Actions, Jenkins or GitLab CI — build, test, security scanning, blue-green or canary deploys, and automated rollback
Operate Kubernetes clusters (EKS, AKS or GKE) and container workloads with LENS, Helm, ArgoCD or Flux — including autoscaling, ingress, secrets and policy
Build observability with Prometheus, Grafana, OpenTelemetry, ELK or Datadog — metrics, logs, traces, dashboards and SLO-driven alerting
Implement security and compliance controls: IAM, SSO, secrets management (Vault / KMS), vulnerability scanning, policy-as-code (OPA, Checkov) and PCI-aware patterns
Lead incident response — on-call, runbooks, blameless post-mortems, and continuous reliability work to drive down MTTR and toil
Partner with developers on local dev experience, golden paths, internal platform tooling and developer self-service.
Help shape internal platform standards as the stack evolves, contributing to design reviews and sharing knowledge across the India and U.S. teams
Participate in a collaborative DevOps environment, working closely with developers, AI engineers, QA, DBAs and product partners across environments
8+ years of professional DevOps, SRE or platform-engineering experience operating production services
3+ years of hands-on work building CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI or CircleCI) and managing infrastructure as code (Terraform, Pulumi or CloudFormation)
Working knowledge of Kubernetes (EKS, AKS or GKE) and container tooling (Docker, Helm, ArgoCD or Flux)
Strong scripting skills in Python, Bash or Go; solid SQL skills and strong comfort with at least one cloud platform (AWS, Azure or GCP)
Hands-on experience with observability stacks: New Relic, Prometheus, Grafana, OpenTelemetry, ELK or Datadog
Solid understanding of cloud security and compliance practices, particularly in PCI-compliant or regulated environments
Proven ability to work independently and within a team, managing priorities across concurrent projects and time zones, including on-call rotations
Strong written and verbal communication skills; able to work effectively with both technical and non-technical stakeholders
Bonus Skills:
Experience operating Dataiku DSS, Snowflake, or other large-scale data and analytics platforms in production
Experience with service meshes (Istio, Linkerd), API gateways, and zero-trust networking
Experience with policy-as-code (OPA / Rego, Checkov, tfsec) and supply-chain security (SBOM, Sigstore)
Experience with FinOps practices and cloud cost optimization
Experience supporting ML or LLM workloads — GPU scheduling, model-serving infra, vector databases or LangSmith / Langfuse
Experience with database administration / reliability for PostgreSQL, MySQL or Snowflake
AWS / Azure / GCP Professional, CKA / CKAD, or HashiCorp Terraform certification
Experience in loyalty, martech, adtech or a comparable data-rich B2B domain