Agentic AI engineer | airline Tech Ops | release validation

I build agent systems people can operate.

I am not interested in agents that look impressive for five minutes and then hide what they did. My work is about the pieces that make an AI system usable after the demo: the workflow, the record it searched, the tool it called, the test it passed, and the moment it hands work back to a person.

Current build AgentReef

Agent routing, memory, tool access, approvals, and operator visibility.

Evaluation Arena

Test runs before autonomy expands.

Observability Aquarium

Traces for memory, tools, handoffs, and blocked decisions.

Reliability base Aviation + Apple

Records, release validation, diagnostics, and escalation work.

airline technical operations aircraft records and AMOS 10+ Apple OTA validations 20,000+ users supported

AgentReef and Angel Software direction

A control room for agents, not a magic box.

AgentReef is where I am working through the question every serious agent project runs into: who decides what the agent can do, what context it is allowed to use, and when a human needs to take over?

The answer is not one giant prompt. It is a system: routing, retrieval, scoped tools, memory, approvals, traces, and review points that people can actually inspect.

What this proves
Builder
I can turn an agent idea into interfaces, flows, and working system boundaries.
Operator
I think about what breaks after launch: access, drift, missing context, bad handoffs, and unclear ownership.
Manager
I care about the person using the system as much as the model behind it.
mission-control://keith
Route match intent to the right agent lane
Retrieve use the right context, not the loudest context
Act tools stay scoped and logged
Pause judgment returns to a person

route maintenance-record question detected

context source set narrowed before answer

tool write action blocked until approved

human review requested for judgment call

Resume evidence

The resume evidence is the point.

I am aiming at agentic AI work because my background is oddly useful for it. I have worked with aircraft records, release validation, technical support, distributed communities, and the unglamorous parts of systems people rely on.

Current Avelo Airlines

Tech Ops systems work where accuracy, access, and operational handoffs matter.

Jun 2024 - Aug 2025 CommuteAir

Aircraft records, AMOS maintenance data, FAA-oriented traceability, and discrepancy resolution.

Apr 2022 - Jun 2024 Apple Technical Expert

Diagnostics, repair workflows, escalation judgment, and high-volume customer issue resolution.

Jan 2023 - Jun 2023 Apple release validation

10+ OTA updates tested with manual checks, automation, bug reporting, and release review.

2009 - 2020 Angel-Gaming

20,000+ users, eight global servers, 20+ staff, security practices, and uptime ownership.

20,000+active users supported
10+OTA releases validated
8global servers
20+staff led
AI Battle Arena scenario run
Records lookup pass uses cited source
Tool request hold approval required
Ambiguous issue handoff human judgment
Bad context fail blocked before action

Arena: evaluation before autonomy

Before an agent gets freedom, it should fail safely in rehearsal.

Arena is my evaluation project. I do not want to measure agents only by whether they sound right. I want scenarios, regressions, tool-use checks, grounding checks, latency, cost, and failure notes.

Apple release validation shaped this thinking. A release does not become ready because it feels promising. It becomes ready because it survives review.

What this proves
Validation
I have worked inside test and release routines where repeatability matters.
Agent lesson
Autonomy should expand only after the agent proves where it succeeds, where it fails, and how it recovers.

Aquarium: behavior observability

I want agent behavior visible before people trust it.

Aquarium is about watching the system work. What did the agent retrieve? Which memory did it use? What tool did it ask for? What did it refuse to do? Where did a person step in?

That matters because operators do not need a mystical answer. They need a trail they can inspect when something looks wrong.

retrieval traces tool-call timelines blocked decisions human handoffs
behavior trace source checked handoff ready
memory
tool
review

Aviation records + release validation

Aviation and release work changed how I think about AI.

Aircraft records taught me that the boring parts are often the important parts: clean data, a traceable source, a complete handoff, and a way to resolve discrepancies without guessing.

Release validation taught me the same lesson from another angle. If people are going to rely on a system, the system needs checks before it gets more responsibility.

FAA-oriented records work AMOS maintenance data Apple OTA validation bug reporting and review gates
Traceability Can someone find the record later and understand why it mattered?
Validation Did the system survive a real scenario and leave useful failure notes?
Escalation Does the work return to a person when the answer needs judgment?

What I am looking for now

I want to help build agent systems people can actually run.

The roles that interest me sit between AI engineering, workflow design, operations, and evaluation. Airlines are an obvious fit because I know the records and Tech Ops side, but the pattern also applies anywhere people need AI to work inside real process.

I can help a team find a first useful agent project, build the workflow, test the behavior, make the work visible, and keep the human decision points clear.

Working rule If an agent cannot show its work, ask for help, and survive a test case, it is not ready for real responsibility.