Skip to content

Following are my personal thoughts on tech, AI, startups and adoption of AI in Health-Care. You could read more about me here

Subscribe on Substack


Blog / Thoughts

I Tried Karpathy's LLM Knowledge Base Idea. Here Is What Actually Worked.

Like many of you, this tweet from Andrej Karpathy is what pushed me to actually try this. The whole post is worth reading, but the core idea:

Raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM.

That last sentence is the key. The wiki is the domain of the LLM. You do not write it. You do not maintain it. The LLM compiles raw sources into structured knowledge, maintains backlinks, categorizes data into concepts, writes articles for them, and links them all together. Karpathy even runs LLM "health checks" over the wiki to find inconsistent data, impute missing information, and suggest new connections. His wiki grew to around 100 articles and 400K words, and he found that plain LLM grep over index files and summaries worked fine without reaching for fancy RAG.

He closed the post with: "I think there is room here for an incredible new product instead of a hacky collection of scripts." I agree, but I also think the hacky collection of scripts version is underrated. You do not need a product. You need a set of instructions and an LLM that follows them.

This resonated because I had just opened Apple Notes looking for something about a client project and ended up scrolling through pages of content from a company I left two years ago. Launch dates that already passed. Candidate notes for people I hired eight months ago. Project plans for features that shipped, got killed, or morphed into something unrecognizable. The notes were not wrong when I wrote them. They are wrong now because nobody updated them.

Your AI Coding Agent Can Exfiltrate Your Credentials. You Would Never Know.

I spent last night configuring Claude Code's security and realized something uncomfortable: for months, I had been running an LLM with unrestricted access to my terminal. It could read my SSH keys, browse my AWS credentials, curl data to any endpoint, and push code to production. I just never thought about it because the tool was helpful and nothing bad had happened yet.

That is exactly the kind of reasoning that gets production databases dropped.

The Tribal Knowledge Problem Nobody Is Solving for Analytics

Your AI can write SQL. It just has no idea what the data means.

I have spent the last four years building AI products in healthcare. Our databases have columns like amt_1, stat_cd, eff_dt. A model looking at raw schema has no way to know that amt_1 is patient copay in one table and coinsurance in another. That stat_cd means enrollment status, not statistical code. That eff_dt is the date a policy became active, not when something happened.

This is tribal knowledge. It lives in the heads of the three people who built the database. It is not documented anywhere. And it is the reason text-to-SQL fails in production.

Clawdbot and the Era of AI in a Box

There's a lot of hype around Clawdbot. People claiming it'll make you a billion dollars, automate your business, act as your chief of staff. And yes, it's also a security nightmare.

But there's something real here. Clawdbot (now renamed Moltbot) is pointing toward a fundamentally different relationship with AI. Not a chat window you visit, but a system running on YOUR machine, 24/7, on your infrastructure, with your files. AI in a box.

Escaping Context Amnesia: Practical Strategies for Long-Running AI Agents

The promise of autonomous AI agents is vast: give them a high-level goal, grant them access to tools, and watch them execute complex workflows. But reality often hits hard. Specifically, it hits the context window.

Models like Claude Sonnet 4.5 now offer 200K tokens (up to 1M in beta), and GPT-5.1 supports 400K tokens with native compaction that claims to work across millions of tokens. Problem solved, right?

Not quite. Bigger context windows don't solve the problem. They mask it.

AI Sucks at Analyzing Data

User asks an AI data analysis tool: "Pull all patient communication encounters from last month."

The AI confidently writes a SQL query, hits the encounters table directly, and returns 1,247 records. AI confidently answers that. Three days later, you discover the actual number should have been 3,891, because the real path is patients → patient_queue → queue_encounters → encounters. The AI missed two-thirds of your data.

Designing a Patient Self-Reporting Stack Around Humans and AI

Every care team we talk to has the same complaint: patients happily text, leave voicemails, and fill out surveys, but those signals rarely make it into the plan of care.

Electronic records were never built to absorb that ambient context, and the people who could act on it are already drowning in portal messages and follow-up calls. Yet the value is obvious, timely symptom reporting keeps people out of the ED, surfaces social needs, and lets providers adjust therapy before a flare turns into a crisis.

What we need is a stack that captures self-reported data, triages it with large language models, and still gives clinicians the last word. The winning pattern blends thoughtful UX, observability, and a human-in-the-loop workflow.

Recursive Summarization Unlocks Effective LLM Integration in Healthcare

Your patient has 247 pages of medical records spanning 8 years. Two ER visits, three specialists, ongoing knee osteoarthritis, recent ACL reconstruction. How do you create a coherent summary that preserves critical information while making it digestible for both clinicians and AI systems?

The answer isn't just summarization, it's recursive summarization. And the secret isn't just what you summarize, but what you choose to preserve at each level of abstraction.