
I focus on where AI systems fail in real-world use and how to make them reliable enough for decision-making.My background is in linguistics, which shapes how I approach NLP. I start from how domain experts actually reason, clinicians working with ontologies, lawyers navigating case law, and build systems around that, rather than treating language as a purely data-driven problem.Most of my work is in healthcare and legal contexts where people using the tools need to trust what the model is doing.
Legal Document Analysis (Production)RAG system for Norwegian regulatory text, processing 25+ cases weekly. Worked with 3 lawyers and 2 developers to define what "correct" meant — then built evaluation criteria from their feedback. Retrieval accuracy went from 75% to 98% over several months of iteration.Architecture: Claude API, MongoDB, Azure deployment. Version-controlled prompts with traceable rationales for audit compliance.MedTermCheckVerification layer for LLM-extracted medical entities. The problem: LLMs confidently output clinical codes that don't exist or don't match the context.The system checks each extracted entity against ICD-10-CM (~70K codes, local file) and SNOMED-CT (Snowstorm API), then scores confidence using four independent signals. If SNOMED is unavailable, it degrades to ICD-10 plus source grounding — still useful, just fewer signals.Includes 20 annotated test cases with deliberate hallucination traps.GitHub | DemoGDPR Article 9 Compliance CheckerRule engine for healthcare AI documentation. Scans privacy policies and DPIAs against 42 Article 9 requirements, flags missing legal bases.Design choice: rules, not LLM classification. Compliance needs determinism and audit trails — "the system flagged this because rule 9.2.h matched" is defensible, "the model thought this was non-compliant" is not.YAML-based rules, evidence extraction, versioned decisions. English only. Limitations documented.GitHub | DemoMedical Text Classification (Thesis)Neural classifier for systematic review automation. Small dataset (150 abstracts), so I integrated SNOMED-CT concepts into the feature space to compensate. The interesting finding: ontology features changed what the model attended to, not just the accuracy number.Baseline 75% → final 93% F1. 11 configurations tested with full audit trail.GitHub
Tools
Programming: Python, SQL, Bash
ML/NLP: PyTorch, TensorFlow, Hugging Face, spaCy, scikit-learn
LLM Integration: Claude API, OpenAI API, RAG architectures
Infrastructure: FastAPI, Docker, GitHub Actions, Azure, MongoDB
Regulatory: YAML rule engines, PyMuPDF, versioned prompt auditing

And...Outside work, you'll find me at a piano working through jazz standards, or in the kitchen trying to get a new recipe right. I read a lot of sci-fi.I've lived and worked across Ghana, China, and Norway — three years on the Volta River coordinating between Ghanaian and Chinese engineering teams, then three years in Bergen. These days I'm trying to keep my Norwegian from getting rusty.I judge cities by their coffee shops.Contact[email protected] | LinkedIn | GitHubWorking on something in healthcare AI, NLP, or regulatory compliance? I'd be interested to hear about it.