Hi, I'm Sam

Hi, I'm Sam.
An NLP Engineer focused on explainable AI and compliance automation in healthcare.
My background in computational linguistics shapes how I approach AI: through language, reasoning, and human context.
I've built ontology-driven models for medical text screening, compliance tools for GDPR Article 9, and AI assistants that work with, not over, domain experts.
My goal is to make AI trustworthy, auditable, and usable in real-world settings.

Research Background

MPhil in Linguistics (Computational focus) — University of Bergen (2024)• Thesis: Ontology-enhanced ML for Medical Literature Screening → achieved 90% F1-score, reduced review time from 6 months to 1 week.
• Experience mentoring graduate students and teaching technical topics.
• Legal-AI startup experience (Innovation Norway–supported).
Focus areas: ontology-enhanced ML, leakage-safe evaluation, calibration, iterative human-in-the-loop reviews, and reproducible pipelines.

Projects

Featured Projects**Featured Projects
1. Medical Intervention Text Triage (Systematic Reviews)
Automated medical literature screening using ontology-augmented classifiers (SNOMED-CT).
Outcome: Achieved 90% F1-score with 150 training samples. Reduced prescreening time from 6 months to 1 week in pilot settings.
Approach: Started from a simple baseline, performed error analysis, and introduced targeted model complexity with a full audit trail for transparency.
Assurance: External cohort checks and reviewer-level agreement testing.
Stack: Python, TensorFlow, scikit-learn, spaCy, SNOMED-CT ontology
2. GDPR Article 9 Compliance Checker (Healthcare AI)Open-source rule engine for scanning healthcare privacy documents and DPIAs against 42 GDPR Article 9 requirements on special-category data.
Outcome: Automatically detects missing legal bases and documentation gaps.
Approach: YAML-based rule logic with evidence extraction and versioned decisions for transparency.
Assurance: Keyword-driven scoring contextualized for focused DPIAs (10–30 % typical coverage); explicit limitations documented (semantic scope, English-only).
Stack: Python, Streamlit, PyMuPDF, YAML, Pandas.
3. Legal AI Analysis System (Oslo Startup)Production RAG system for regulatory text analysis, processing 25+ legal cases weekly.
Outcome: Achieved 98% accuracy through iterative prompt engineering (improved from 75% initial). Reduced manual review load and ensured reproducible outputs for compliance teams.
Approach: Retrieval-Augmented Generation combining LLM analysis with Norwegian legal case database. Collaborated with 3 lawyers and 2 developers to validate outputs.
Assurance: Version-controlled prompts, traceable rationales, and governance hooks to meet audit standards.
Stack: Python, Claude API, MongoDB, Azure
4. Customer Analytics with Uncertainty (Selected Non-Medical)Built explainable churn prediction models with calibrated confidence intervals.
Outcome: Delivered interpretable drivers of churn and improved decision confidence in retention models.
Assurance: Leakage detection, stability testing, and calibration across time splits.
Stack: Python, scikit-learn, SHAP, XGBoost.
5. Human–AI Creative Analysis (Research)Studied 1,298 prompt–image interactions in generative models to understand creative decision patterns.
Outcome: Produced reproducible methodology for prompt analysis and interpretability insights into multimodal model behavior.
Assurance: Versioned datasets and transparent evaluation scripts.
Stack: Python, Hugging Face Transformers, CLIP, Pandas.
Core Tools & Methods
Programming / Data: Python, R, Bash, SQL
ML / NLP: PyTorch, TensorFlow, Hugging Face, spaCy, scikit-learn
LLM Integration: Claude API, OpenAI API, RAG architectures, Prompt engineering
Regulatory / Audit: YAML-based rule engines, PyMuPDF, pdfminer
MLOps / Deployment: Streamlit, FastAPI, Docker, GitHub Actions, Azure
Databases: MongoDB, PostgreSQL, SQL
Research / Reproducibility: Jupyter, Pandas, Versioned datasets, Prompt auditing

Current Projects
GDPR Healthcare AI Compliance Scorer
GDPR Article 9 Compliance Checker
Open-source tool that scans healthcare AI documentation for GDPR Article 9 compliance.
• Checks privacy policies, DPIAs, and compliance docs against 42 special category data requirements
• Identifies which legal bases are documented and highlights gaps
•Tested on real DPIAs from healthcare organizations
• Built with: Python, Streamlit, PyMuPDF, YAML-based rules engine
🔗 GitHub | 🎯 Try Demo

Beyond the CodeOutside work, you’ll find me at a jazz jam, learning new recipes, or discussing sci-fi at book clubs.
I’ve lived in Ghana and Norway, with time in China and Europe—perspectives I bring to building globally aware, inclusive AI.
Currently learning Norwegian (and a bit of German + Japanese).

Contact

Let's Connect - Open to partnerships on explainable AI, clinical NLP, or regulatory ML.📧 [email protected]
💼 linkedin.com/in/sammens
🔗 github.com/SamInMotion
Interested in production-oriented roles at the intersection of compliance, healthcare AI, and explainable ML. Open to relocation within the Nordics.

Technical blog coming soon - insights on explainable AI and healthcare compliance

Built with Carrd — Content © Samuel Okoe-Mensah 2025.