Can I apply directly for this job on this page?

Yes, you can begin your application on this page using a quick form. You'll then be redirected to the employer's career site to complete the full application process.

What is the role of a Senior Site Reliability Engineer (AI-Native) at Paradise Media LLC?

The Senior Site Reliability Engineer (AI-Native) position at Paradise Media LLC is a Full-time or part-time position opportunity in the Engineering field.

Where is this Senior Site Reliability Engineer (AI-Native) job located?

St Julian's, Other / Non-US, STJ1219, Malta

What type of employment is offered for this Senior Site Reliability Engineer (AI-Native) role?

Full-time or part-time position

What is the expected salary for this Senior Site Reliability Engineer (AI-Native) job?

Compensation will be discussed during the hiring process.

Senior Site Reliability Engineer (AI-Native) job near me in St Julian's, Other / Non-US at Paradise Media LLC

Paradise Media is a fast-growing performance marketing company behind some of the most successful affiliate and iGaming brands in the world. We run a global network of high-authority sites across casino, sports, and entertainment built on data, experimentation, and top-tier SEO.

We're a private company with strong capital reserves and no outside investors, making us a stable, independent, and fast-moving place to grow your career. You'll work directly with the CEO and leadership team, have a real voice in strategy, and see your ideas go live fast.

We're scaling quickly to become one of the largest privately-owned companies in iGaming. A team where smart, driven people can have a massive impact and build something enduring.

About the role

We are seeking a Senior AI-Native Site Reliability Engineer to lead the reliability, performance, security, automation, and operational maturity of a growing portfolio of high-performing web platforms and digital products.

This role is ideal for a pragmatic senior reliability engineer who can operate and improve production systems, automate repetitive work, use AI safely to accelerate operations, understand performance and security deeply, and communicate clearly during incidents.

You will combine senior-level SRE, DevOps, infrastructure, security, and platform engineering expertise with a modern AI-first approach to operations. You will be expected not only to maintain systems, but to improve how they are designed, monitored, deployed, secured, and operated.

The successful candidate will be comfortable owning critical production environments across varied technology stacks, leading incident response, improving platform resilience, mentoring others, reducing operational toil through automation, and using AI tools responsibly to accelerate analysis, documentation, monitoring, debugging, remediation, and continuous improvement.

Roles & Responsibilities:

Reliability & Operational Ownership

Own uptime, performance, scalability, and resilience of production web platforms and supporting infrastructure.
Define and improve SLIs, SLOs, error budgets, HA, fault tolerance, DR, and graceful degradation.
Lead capacity planning, identify single points of failure, and act as senior technical owner during high-severity incidents.

Performance Engineering & Scalability

Lead optimization across application, infrastructure, database, caching, CDN, and edge layers (Redis, Varnish, Cloudflare or similar).
Establish benchmarks, regression checks, dashboards; reduce technical bloat across code, dependencies, assets, and infrastructure.
Align performance work with SEO, product, and commercial impact.

AI-Native Operations & Automation

Lead safe, practical AI-assisted workflows for log analysis, incident investigation, runbook creation, monitoring, security triage, and postmortems.
Automate repetitive ops via scripts, IaC, and AI-assisted tooling; build anomaly detection, alert triage, and operational reporting workflows.
Create reusable prompts, playbooks, and templates; define guardrails for data sensitivity, access control, human approval, and auditability.

Monitoring, Observability & Incident Management

Own monitoring/alerting across apps, infra, databases, caches, queues, CDNs, cloud services, and critical user journeys.
Design actionable dashboards and alerts that reduce noise and improve MTTD/MTTR.
Lead incident response, RCA, postmortems, and preventive actions; mentor on troubleshooting and calm communication under pressure.

Security, Resilience & Platform Hardening

Own production security posture: WAF, SSL, vulnerability management, malware/bot mitigation, threat detection, and remediation.
Harden servers, databases, cloud, containers, CI/CD, secrets, and production access; manage secure dependency and patching processes.
Maintain backup, recovery, and DR practices; contribute to security incident response, containment, and prevention.

Infrastructure, Cloud & Platform Engineering

Design and operate hosting/runtime environments across varied stacks (web/app servers, databases, caches, queues, containers, cloud).
Automate backups, updates, deployments, provisioning, and health checks using Ansible, Terraform, Docker, Kubernetes, Jenkins, GitHub Actions, or similar.
Support AWS, GCP, Azure, or modern managed hosting; set infrastructure standards balancing reliability, security, performance, and cost.

DevOps, Release Engineering & Developer Enablement

Lead CI/CD design, staging environments, rollback strategies, progressive delivery, and deployment observability.
Partner with developers to embed reliability, performance, and security into the SDLC; build tooling and runbooks for safer shipping.

Documentation, Collaboration & Technical Leadership

Maintain runbooks, troubleshooting guides, architecture notes, and operational playbooks (AI-assisted where useful, technically validated).
Act as senior technical partner to engineering, product, SEO, and business stakeholders; mentor engineers and shape ops standards.

Requirements:

Preferred Experience

6+ years in SRE, DevOps, Infrastructure, Platform, or Security Engineering.
Operating high-traffic web platforms, SaaS, SEO/content-heavy, affiliate, publishing, media, or e-commerce environments.
Cloudflare, edge caching, WAF, CDN optimization, bot mitigation; AI-assisted ops or agentic engineering workflows.
Leading high-severity incident response; defining SLOs, postmortems, runbooks; FinOps / cloud cost optimization.
Certifications in AWS, GCP, Azure, Linux, Kubernetes, or security are a plus.

Required

Senior-level experience in SRE, DevOps, infrastructure, platform engineering, or production operations.
Proficiency in Python, Bash, PHP, JavaScript/TypeScript, Go, or similar; strong Linux server administration.
Experience with web/app servers, databases, caches, queues, CDNs, cloud (AWS/GCP/Azure), and production traffic flows.
Strong Git, CI/CD, deployment automation, rollback, and release management; solid DNS, SSL, networking, and load balancing fundamentals.
Proven ability to troubleshoot complex production issues using logs, metrics, traces, and profiling—and to own systems without close supervision.

AI-Native Skills

Practical use of AI for debugging, documentation, scripting, analysis, and workflow automation, with strong judgment on validation.
Ability to design safe, human-in-the-loop AI workflows and reusable prompts/playbooks; sound judgment on privacy, access, and data sensitivity.

Performance & Observability

Hands-on with Datadog, New Relic, Grafana, Prometheus, Cloudflare Analytics, OpenTelemetry, Lighthouse, WebPageTest, or similar.
Strong grasp of caching, DB tuning, asset optimization, front-end and backend performance, edge delivery, SLOs/SLIs.

Security & Resilience

Production security practices: access control, WAF, vulnerability management, secrets, patching, incident response.
Backup strategy, recovery testing, DR planning; bot mitigation, dependency risk, malware detection, threat monitoring.

Automation & DevOps

Ansible, Terraform, Jenkins, Docker, Kubernetes, GitHub Actions; IaC, containerization, orchestration, configuration management.

Communication & Leadership

Calm incident leadership; clear technical communication to technical and non-technical stakeholders; mentoring and knowledge sharing.

Success in This Role Looks Like

Platforms are faster, more reliable, and more secure; monitoring is actionable and incidents are managed calmly with meaningful follow-up.
Manual work shrinks through automation and AI-assisted workflows; developers ship more safely; risks are caught before they impact the business.

Our Benefits:

We offer a competitive salary, and the opportunity to work with a talented and passionate team in a fast-paced, dynamic environment.

Senior Site Reliability Engineer (AI-Native) in St Julian's at Paradise Media LLC

Explore Related Opportunities

Job Description

Scan to Apply

Job Location

Frequently asked questions about this position