Senior Software Engineer, Ingestion Team at Pryon – San Francisco, California
Explore Related Opportunities
About This Position
The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population — across every file format and content type our customers throw at us.
We’re in the middle of a significant architectural evolution — migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.
This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures.
- Is self-driven and comfortable operating with autonomy inside a structured team
- Gets energized by architectural challenges, not just feature work
- Has the patience and discipline to improve existing systems while building new ones
- Understands that pipeline engineering is about handling the 10,000 edge cases, not just the happypath
- Is motivated by the mission: building the processing backbone that makes enterprise AI accurateand reliable
- Communicates well in a remote-first environment and collaborates naturally across teamboundaries
- Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing
- Contribute to the design of next-generation pipeline architecture as the system evolves
- Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
- Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
- Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
- Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
- Participate in the ongoing migration from legacy systems, balancing new development with operational stability
- 5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
- Strong proficiency in Python and/or Go
- Experience with workflow orchestration tools — Temporal, Airflow, Prefect, Step Functions, or similar
- Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
- Hands-on experience with Kubernetes, Docker, Terraform, and Helm
- Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
- Comfort working across cloud providers (AWS, Azure, GCP)
$175,000 - $200,000 a year