Member of Technical Staff - Platform in Fitzroy North, Victoria at Predictive Text Labs
Explore Related Opportunities
Job Description
About Predictive Text Labs
PTL builds AI that predicts the future. Our hybrid reasoning engine has achieved a Brier score of 0.121, beating human superforecasters. We're backed by Blackbird Ventures and notable angels, including Balaji Srinivasan, Synthesia founders, and Supabase founders.
To get there, we need a platform engineer whose mandate is to make our prediction-to-trade runtime reliable, reproducible, observable, and safe to evolve.
The roleYou will own PTL's prediction-to-trade runtime: the platform that runs forecasting pipelines, persists auditable reasoning traces, supports backtests and live/paper evaluations, and turns forecasts into trade intents, alerts, and broker-executed orders.
This is not a generic developer role and not a pure infrastructure role. You will work across API contracts, event streams, state machines, background orchestration, database schemas, broker adapters, reconciliation, observability, and deployment safety.
What you'll do- Own runtime correctness across prediction batches, prediction stages, pipeline specs, schedule runs, question snapshots, strategy states, target snapshots, order records, fills, positions, and audit events.
- Harden orchestration across Trigger.dev: retries, idempotency, deterministic run keys, cancellation, lock strategy, failure classification, replay, and safe recovery.
- Own API and event contracts across multiple repos: SSE events, OpenAPI/Zod schemas, structured artifacts, and stage/agent attribution.
- Build headless prediction and evaluation workflows: scheduled batches, locked datasets, live-market and paper-trading probes, benchmark runs, and operator controls.
- Build production observability: structured logs, Sentry, OpenTelemetry, pino, provider and tool-call timing, run-level dashboards, cost and token tracking, actionable alerts, and incident workflows.
- Maintain data integrity across market ingestion, resolution syncing, cutoff dates, snapshot coverage, multi-choice market semantics, and source-specific schema quirks.
- Support leakage-safe research workflows: frozen evidence, cutoff-date validation, trace replay, postmortem capture, and experiment-card audit trails.
- Maintain deployment and environment hygiene across Vercel, Supabase, Trigger.dev, Doppler, AWS/EC2, and Cloudflare/SSM.
- Improve platform velocity: contract tests, replay and regression harnesses, local-to-prod parity, paper-trade smoke tests, and reduction of flaky behavior.
- Partner with Research, Data Science, Data Infrastructure, and Trading to turn evolving research logic into stable runtime contracts. You will not own the research thesis; you will own the systems that make research executable, measurable, and safe.
- Strong TypeScript/Node backend engineering experience in production systems with real operational risk.
- Experience designing stateful workflows where correctness depends on explicit status transitions, idempotency, and auditability.
- Deep familiarity with Postgres-backed systems: schema design, migrations, constraints, indexes, RLS and auth boundaries, and data-quality checks.
- Experience with asynchronous orchestration: queues, scheduled jobs, retries, cancellation, replay, compensating actions, and dead-letter or manual recovery paths.
- Strong API and event-contract instincts: OpenAPI/Zod-style schemas, SSE or other streaming protocols, versioning, backward compatibility, and structured artifacts.
- Practical observability experience: structured logs, tracing, Sentry or equivalent, dashboarding, alerting, and incident diagnosis.
- Ability to work across app, runtime, and integration layers in one codebase without losing architectural discipline.
- Fluency with AI-assisted development in large TypeScript systems; able to use agents and code assistants productively without sacrificing review discipline.
- Strong product judgment under uncertainty: you can ship pragmatic runtime improvements while preserving correctness in high-stakes paths.
- Ability to partner with research and data teams and translate evolving experimental logic into stable production contracts.
- Experience with Supabase, Trigger.dev, Drizzle, Hono, Next.js, or similar TypeScript runtime stacks.
- Experience with trading systems, broker APIs, prediction markets, exchange APIs, order lifecycle management, or execution-critical fintech systems.
- Experience with event-sourced or audit-ledger style systems: order events, fills, positions, reconciliation, or payment-state machines.
- Familiarity with LLM pipelines, tool-calling, structured outputs, reasoning traces, or model-evaluation infrastructure.
- Familiarity with ClickHouse or other OLAP systems and where analytical vs transactional boundaries should live.
- Experience building deterministic replay or regression frameworks for workflows with external providers.
- Experience with leakage-safe backtesting, frozen data snapshots, or time-consistent evaluation.
- Australia's highest powered team. Our founding team consists of Australia's Kaggle champion, SIG's Australia's top equities analyst, PhDs who reached 6th in ARC-AGI, and the founder of a time series foundation model lab. Our co-founders include the founder of Netlify, one of the world's largest DevOps unicorns, the creator of DLFinLab, and Forbes 30 Under 30 Alumini
- Real traction. Our forecasting system already outperforms human superforecasters in internal and live evaluation.
- High-leverage role. You own the runtime that connects forecasting, backtesting, evaluation, and trade execution.
- Technically dense domain across AI reasoning, prediction markets, trading systems, data quality, and reliability engineering.
- Compounding research loop. You will build the infrastructure that makes traces, evals, postmortems, paper and live probes, and production feedback compound over time.
- Small, senior team with high ownership and fast iteration.
- Backed by top-tier investors and operators.
- Remote-friendly with Sydney and San Francisco presence.
Send your resume and a brief note covering:
- A production workflow you owned that required strict state transitions and idempotent background execution. What broke, how did you detect it, and how did you harden it?
- An incident where async orchestration — queues, jobs, webhooks, or streaming — caused user-facing, operational, or financial risk. How did you mitigate it and prevent recurrence?
- Design a recovery path for this scenario: a prediction batch is running, the SSE stream disconnects, the model provider times out, partial stage artifacts have been persisted, and a downstream trade alert depends on the final probability. What should the system persist, retry, replay, suppress, and alert on?
- How would you evolve a human-in-the-loop paper trading stack into a reliability-first live trading platform without losing developer velocity or operator control?