Human-AI Collaborative Research

This page documents how I work. Not to defend AI assistance, but to make the methodology legible as legitimate research practice.


The Core Claim

I cannot produce code under timed, unassisted conditions. I can produce verified, documented, functional technical work through human-AI collaboration.

These are different skills. The second is what research actually requires.


The Methodology

1. Polyphonic Verification

Every significant technical decision passes through multiple analytical perspectives before implementation. This isn’t metaphor. It’s documented in the codebase.

# Kali [Visionary]: This could evolve into a full screenshot API...
# Athena [Reviewer]: But right now it just needs to capture screens.
# Nemesis [Privacy]: What if sensitive data is visible?
# Kali [User Advocate]: Consent is explicit - human runs the command.
# Klea [Product]: Should this exist? Yes. Solves a real problem.
img = ImageGrab.grab()

Five facets, twenty hats. Security, ethics, accessibility, performance, user advocacy. Each gets explicit voice. The friction is visible. The reasoning is documented.

This isn’t AI generating code and human accepting it. It’s structured deliberation producing verified output.

Convergence: This is hybrid human-AI evaluation. The same structure Scale/SEAL uses for benchmark annotation (AI does initial analysis, human verifies and corrects, iterate until reliable). We built this independently because it works. That convergence is meaningful.

See the code: tools/. Working examples with polyphonic comments throughout.

2. Verification Protocol

Before accepting AI-generated work:

3. Failure Documentation

Rigorous methodology requires honest accounting of failures. From the project’s lesson log:

2026-01-07: Post-compaction, todo list said a task was completed. Stashed work to sync, dropped stash, discovered work was never committed. Had to redo. The summary lied, or rather, reported working-tree state as “done.”

This produced a documented protocol change: never drop stash after compaction without inspection.

Failures are features, not bugs. They produce learning. Hiding them produces brittleness.

4. Extension Beyond Prompts

AI assistance provides starting points. The work requires:

The recognition engine, for example, implements multi-force graph dynamics: semantic attraction, type repulsion, confidence gradients. The physics metaphor came from me. The implementation was collaborative. The verification was systematic.


The Evidence

Quantitative Work

Artifact What It Demonstrates
Master’s thesis statistics 303,600 coding decisions, Cohen’s kappa = 0.83, proper bootstrap resampling
Recognition engine Graph algorithms (betweenness centrality, clustering, pagerank), semantic similarity computation
Embedding pipeline Sentence transformers, UMAP dimensionality reduction, visualization
Semantic Scholar integration API design, rate limiting, caching, data pipeline

Process Evidence

Artifact What It Demonstrates
240+ GitHub issues Systematic project management, not ad-hoc prompting
Polyphonic code comments Verification happening in real-time, documented
PR workflow Branch rulesets require PRs to merge to master
CI/CD pipeline Secret scanning (Trufflehog), link checking, Dependabot
Secrets management GitHub Secrets for API keys, local .env (gitignored)
Lesson log Honest failure documentation, protocol improvement

What This Means for Assessment

Timed solo coding tests measure:

Research positions require:

The portfolio demonstrates the second set. I am asking for assessment on what the job requires, not on a format that measures different skills.


The Accommodation Request

I am not asking for lower standards. I am asking for appropriate measurement.

What I’m offering as equivalent evidence:

  1. Runnable code with documented methodology
  2. Verification trails showing I check and understand the work
  3. Failure documentation showing I catch errors and learn from them
  4. Extension evidence showing I build beyond what AI suggests

What this demonstrates:


Reproducibility

The methodology is documented. The process is replicable. Someone following these practices would produce similar quality output.

This is what distinguishes rigorous human-AI collaboration from “vibe coding”: documented process, verified output, honest accounting of limits.


See the Evidence -> Back to Framework ->