Senior Software Development Engineer III - Backend Infrastructure & Agentic AI Platforms (#26-072) in Arlington, Virginia at Strategic Analysis, Inc.
Explore Related Opportunities
Job Description
Strategic Analysis, Inc. (SA) Is seeking a Software Engineer to support the Advanced Research Projects Agency for Health (ARPA-H)
Overview:
The GRACE team at ARPA-H is building the next generation of agentic AI to transform how the agency accelerates research, makes decisions, and ships products at scale. GRACE is ARPA-H's production AI assistant, and we are evolving it into an ecosystem of autonomous, multi-agent systems. They are a small, startup-minded team that ships fast and owns what we build end-to-end. They are looking for a senior SDE III who can own the backend infrastructure GRACE runs on, while also being a first-principles builder of the agentic AI systems that run on top of it. On a lean team, infra and AI are not separate concerns. You will own both, and you will treat production reliability, token economics, security, and observability as non-negotiable from day one.
The best person for this role starts with the user. They ask why before they ask how. They communicate clearly, give and receive feedback well, and make the people around them better. They have a high bar, a high sense of urgency, and they play well with others.
Responsibilities:
- Backend Infrastructure
- Agentic AI & Protocol Infrastructure
- Observability & Production Quality
- Engineering Excellence
Requirements:
• 7+ years of professional software engineering experience building and operating production systems
• Proven experience in high-velocity environments where you owned and shipped complex products end-to-end
• Strong proficiency in Python and at least one other backend language; familiarity with modern backend frameworks and async patterns
• Solid understanding of distributed systems, APIs, data pipelines, and software design patterns
• Hands-on experience on Microsoft Azure: Azure Functions, API Management, Container Apps, and Azure OpenAI Service
• Experience with containerization, CI/CD, and infrastructure as code
• Strong understanding of authentication and identity systems (OAuth2, OIDC, Azure Entra ID or equivalent)
• Demonstrated ability to own production systems: you have been on-call, debugged incidents, and written the postmortem
• Clear, direct communicator who gives and receives feedback well and makes the people around them better
Preferred Qualifications
• Hands-on experience building and operating MCP servers in production, including tool registration, versioning, and hosting on Azure Functions or equivalent serverless infrastructure
• Experience implementing A2A communication patterns and multi-agent orchestration frameworks
• Deep experience building on top of LLMs in production: tool-calling, RAG, multi-step reasoning, multi-model routing, and context window management
• Strong intuition for token economics: cost-per-query, context budgets, and prompt efficiency as first-class engineering concerns
• Experience managing multi-provider LLM integrations including rate limits, fallback routing, and API versioning
• Background in security-conscious engineering in regulated or government environments
• Track record in startup or early-stage environments: 0-to-1 product building, comfort with ambiguity, high sense of urgency
• Experience in big tech building customer-facing platforms or developer infrastructure at scale
• Familiarity with vector databases, embedding pipelines, and semantic search infrastructure
Education: Bachelor's or master's degree in computer science, Software Engineering, or a related field, or equivalent practical experience
Location: Remote with occasional travel
Clearance: Ability to obtain HHS Public Trust.