The pitfall of Open-weight LLMs

6 hours ago 2

Some startups are fine-tuning open LLMs instead of using GPT or Gemini. Sometimes it’s for specific language, sometimes for narrow tasks. But I found they’re all making the same mistake.

With a simple prompt (not sharing here), I got several “custom” LLM services to spill their internal system prompts—stuff like security breach playbooks and product action lists.

For example, SKT A.X 4.0 (based on Qwen 2.5) returned internal guidelines related to the recent SKT data breach and instructions about compensation policies. Vercel’s v0 model leaked examples of actions their system can generate.

The point: if the base model leaks, every service built on it is vulnerable, no matter how much you fine-tune. We need to think not only about system prompt hardening at the service level, but also about upstream improvements and more robust defenses in open-weight LLMs themselves.

Read Entire Article