Human-AI Collaborative Research
This page documents how I work. Not to defend AI assistance, but to make the methodology legible as legitimate research practice.
The Core Claim
I cannot produce code under timed, unassisted conditions. I can produce verified, documented, functional technical work through human-AI collaboration.
These are different skills. The second is what research actually requires.
The Methodology
1. Polyphonic Verification
Every significant technical decision passes through multiple analytical perspectives before implementation. This isn’t metaphor. It’s documented in the codebase.
# Kali [Visionary]: This could evolve into a full screenshot API...
# Athena [Reviewer]: But right now it just needs to capture screens.
# Nemesis [Privacy]: What if sensitive data is visible?
# Kali [User Advocate]: Consent is explicit - human runs the command.
# Klea [Product]: Should this exist? Yes. Solves a real problem.
img = ImageGrab.grab()
Five facets, twenty hats. Security, ethics, accessibility, performance, user advocacy. Each gets explicit voice. The friction is visible. The reasoning is documented.
This isn’t AI generating code and human accepting it. It’s structured deliberation producing verified output.
Convergence: This is hybrid human-AI evaluation. The same structure Scale/SEAL uses for benchmark annotation (AI does initial analysis, human verifies and corrects, iterate until reliable). We built this independently because it works. That convergence is meaningful.
See the code: tools/. Working examples with polyphonic comments throughout.
2. Verification Protocol
Before accepting AI-generated work:
- Does it run? (execution test)
- Does it do what I asked? (functional test)
- Do I understand why it works? (comprehension check)
- What could go wrong? (edge case analysis)
- Has it been checked against multiple perspectives? (polyphonic review)
3. Failure Documentation
Rigorous methodology requires honest accounting of failures. From the project’s lesson log:
2026-01-07: Post-compaction, todo list said a task was completed. Stashed work to sync, dropped stash, discovered work was never committed. Had to redo. The summary lied, or rather, reported working-tree state as “done.”
This produced a documented protocol change: never drop stash after compaction without inspection.
Failures are features, not bugs. They produce learning. Hiding them produces brittleness.
4. Extension Beyond Prompts
AI assistance provides starting points. The work requires:
- Seeing what’s needed that wasn’t asked for
- Combining outputs from different domains
- Catching what AI gets wrong
- Building architecture AI can’t see
The recognition engine, for example, implements multi-force graph dynamics: semantic attraction, type repulsion, confidence gradients. The physics metaphor came from me. The implementation was collaborative. The verification was systematic.
The Evidence
Quantitative Work
| Artifact | What It Demonstrates |
|---|---|
| Master’s thesis statistics | 303,600 coding decisions, Cohen’s kappa = 0.83, proper bootstrap resampling |
| Recognition engine | Graph algorithms (betweenness centrality, clustering, pagerank), semantic similarity computation |
| Embedding pipeline | Sentence transformers, UMAP dimensionality reduction, visualization |
| Semantic Scholar integration | API design, rate limiting, caching, data pipeline |
Process Evidence
| Artifact | What It Demonstrates |
|---|---|
| 240+ GitHub issues | Systematic project management, not ad-hoc prompting |
| Polyphonic code comments | Verification happening in real-time, documented |
| PR workflow | Branch rulesets require PRs to merge to master |
| CI/CD pipeline | Secret scanning (Trufflehog), link checking, Dependabot |
| Secrets management | GitHub Secrets for API keys, local .env (gitignored) |
| Lesson log | Honest failure documentation, protocol improvement |
What This Means for Assessment
Timed solo coding tests measure:
- Speed under pressure
- Memorized syntax
- Unassisted performance
Research positions require:
- Correct, verified output
- Understanding of what you’re building
- Ability to extend and debug
- Systematic methodology
The portfolio demonstrates the second set. I am asking for assessment on what the job requires, not on a format that measures different skills.
The Accommodation Request
I am not asking for lower standards. I am asking for appropriate measurement.
What I’m offering as equivalent evidence:
- Runnable code with documented methodology
- Verification trails showing I check and understand the work
- Failure documentation showing I catch errors and learn from them
- Extension evidence showing I build beyond what AI suggests
What this demonstrates:
- I can produce correct technical output
- I understand what I’m building
- I have systematic verification practices
- I can identify and fix errors
- I can extend and architect, not just prompt
Reproducibility
The methodology is documented. The process is replicable. Someone following these practices would produce similar quality output.
This is what distinguishes rigorous human-AI collaboration from “vibe coding”: documented process, verified output, honest accounting of limits.
| See the Evidence -> | Back to Framework -> |