Senior AI Engineer Inference & Agent Systems

Company : Arcana Analytics

Job Type : Full Time

United States

Number of Applicants

000+

Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Senior AI Engineer Inference & Agent Systems

Title: Applied AI Engineer — Inference & Agent Systems

Location:
United States

What We're Building

Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured, reasoned answers in real time. The product only works if the agents are fast, reliable, and correct — not approximately correct.

Our stack: Go + Temporal for orchestration, a Plan-Execute-Synthesize agent architecture, and an evaluation harness we use to measure every regression. The problems are hard. The latency bar is aggressive. The accuracy requirements are unforgiving.

The Work

Inference Optimization

- Drive TTFT below 400ms for multi-step agent pipelines

- Streaming optimization — first token to user while sub-agents are still running

- KV cache strategy, prompt compression, dynamic context window management

- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models

Agent Architecture

- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains

- Build reliable orchestration on top of Temporal — retries, timeouts, partial failure recovery, idempotency

- Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation

- Tool call design: schema design that LLMs actually follow reliably across providers

Evaluation & Harness

- Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR

- LLM-as-judge pipelines for qualitative output assessment

- Latency regression testing — p50/p95/p99 tracked across every deployment

- Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses

Infrastructure

- Model serving and cold start optimization

- Async worker architecture for parallel sub-agent execution

- Observability: trace every token, every tool call, every synthesis step

What We're Looking For

You've built something that runs in production at a meaningful scale and you understand why it's fast (or why it isn't).

Strong signal:

- You've worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully

- You've built multi-step agent systems and you know where they break — not from reading papers but from watching them fail in production

- You've written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful

- You've debugged LLM non-determinism in production and built systems resilient to it

- You've worked with streaming LLM responses and built infrastructure around partial output handling

Weaker signal (but not disqualifying):

- You've fine-tuned models but haven't shipped inference systems

- You've used LangChain/LlamaIndex but haven't built the

layer underneath

- Strong ML research background without systems exposure

Stack familiarity (we care more about depth than match): Go, Python, Temporal, Kafka, PostgreSQL, Docker

Why This Role

The problems here don't have blog posts about them yet. Parallel agent DAG execution under hard latency budgets, streaming synthesis across partial sub-agent results,

eval harnesses for non-deterministic multi-step systems — these are genuinely unsolved at production quality. Small team. High ownership. Every engineer's decisions ship to production.

Who We Want to Hear From

You've shipped inference systems at:

- A real-time AI product (search, coding assistant, chat at scale)

- A model serving infrastructure company

- An agent platform (any domain)

Or you've built eval/harness infrastructure that a team of 10+ engineers actually trusted to catch regressions.

Apply

Send to: [[email protected]]

Include:

One system you built where latency was the primary constraint —

what you measured, what you changed, what moved

Link to anything public (code, writing, talks) — optional but useful

No cover letter required

We respond to every application.

Original job Senior AI Engineer Inference & Agent Systems posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Apply Now

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Senior AI Engineer Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Senior AI Engineer Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Senior AI Engineer Inference & Agent Systems

Job Description - Senior AI Engineer Inference & Agent Systems

Similar Senior AI Engineer Jobs in the US

Mobile Apps