Model Distillation Attacks: How to Protect Your AI Model

A model distillation attack does not need to break into your systems because you handed over the key the moment you opened your API to the public. Your competitor’s cheapest route to a model like yours might be your own product, queried thousands of times until a smaller copycat learns to answer the way yours does. In your logs, it looks like a busy customer, but to your business, it’s a slow leak of the AI capability you spent real money and many months building.

If you have put a fine-tuned or purpose-built model behind an API, this is your concern to weigh, not just a headline about frontier labs. The reassuring part is that you can defend against it, and you can also turn the same underlying technique into an advantage by distilling responsibly for products of your own. That balance, protecting your model while distilling the right way, is exactly what our model distillation services are built around. Below, we will walk through how these attacks work at a high level, the warning signs to watch for, and the practical steps to make your model a far harder target.

What Is a Model Distillation Attack in AI?

Model distillation started life as a perfectly respectable training method. A large, capable model (the ‘teacher’) answers a huge set of questions, and a smaller model (the ‘student’) learns to imitate those answers until it can do the same job for less money. Done with permission and on data you are allowed to use, this is one of the smartest ways to ship an efficient AI product.

A model distillation attack takes that same idea and removes the permission. Instead of training the student on data they own, an attacker sends a steady stream of questions to your model through its public API, records every answer, and uses those question-and-answer pairs to train a rival model that mimics your behavior. Think of it as hiring someone to sit beside your best consultant, write down every recommendation they give for a year, and then open a competing firm using that notebook.

The most visible example arrived in February 2026. As reported by NBC News, Anthropic accused three AI companies of generating more than 16 million exchanges with its Claude model through roughly 24,000 fraudulent accounts, all to train competing systems. The scale was enormous, but the mechanics were ordinary. Nobody picked a lock, just asked a great many questions.

Who Is Really at Risk from Model Extraction Attacks?

It is tempting to read a story about Anthropic and conclude that model extraction attacks (the broader name for the same threat) are a worry reserved for companies with billion-dollar research budgets. That assumption is where many founders get comfortable a little too early.

The companies most exposed are often mid-sized, with their entire competitive edge hinging on a single model. If you have spent two years fine-tuning a model on proprietary claims data, medical coding, legal language, or logistics routing, that model is your moat. The same goes for businesses that have wired a capable model into their daily operations, as many of the teams behind these Claude API examples for business automation have done. A competitor who can approximate that behavior for a few thousand dollars in compute has effectively skipped the expensive part of your journey.

The numbers explain the temptation, for example, the Stanford Institute for Human-Centered AI found that training a frontier model can cost well over 100 million dollars. Even a focused, domain-specific model represents months of salaries, data licensing, and careful tuning. Distillation allows someone to copy the result of that work without paying for the process, which is precisely why it appeals to rivals seeking a shortcut.

If your product depends on a model like this, it deserves the same protection you would give any core asset. Running that model well and keeping it safe tend to go hand in hand, which is something we dig into when we write about scaling AI models without sacrificing quality. Teams building or refining these systems through our large language model development services tend to bake defenses in from the start, rather than bolting them on after something feels wrong.

How Does a Model Distillation Attack Work Through an API?

You do not need to hand over your model’s inner workings for someone to copy it, and that is the uncomfortable part. An attacker only needs the same front door your real customers use, which is your API.

The pattern looks roughly like this, kept deliberately general. The attacker sends a wide variety of questions across many topics, captures your model’s responses, and feeds those pairs into a smaller model until it learns to answer the same way. The more varied and numerous the questions, the closer the copy gets. There is no malware, no stolen password, and no breached database anywhere in the process.

This is what makes the threat so slippery. Your model is doing exactly what you designed it to do, which is to answer questions well. The very behavior that makes your product valuable is the behavior an attacker harvests. The Open Worldwide Application Security Project (OWASP), which maintains the industry’s reference list of AI security risks, formally lists model theft as a recognized threat for exactly this reason. So you are not being paranoid. You are reading the same risk register the security community uses.

Warning Signs of a Model Extraction Attack in Your API Logs

Because nothing technically breaks, the evidence of a model distillation attack hides in your usage patterns rather than your security alerts. A normal customer behaves like a person with a job to do, while an extraction campaign behaves like a machine trying to map every corner of your model. Once you know the difference, the signals start to stand out.

Here are the patterns worth watching in your API logs:

A single account, or a tight cluster of brand-new accounts, sends far more queries than any genuine user would reasonably need.
The questions sweep across unrelated topics in a systematic way, as though someone is testing the full range of what your model knows rather than solving one real problem.
The traffic shows no human rhythm, arriving in steady programmatic bursts without the pauses, follow-ups, and messy phrasing that real people produce.
Accounts keep probing the edges of your model’s knowledge, repeatedly poking at unusual or boundary questions to see how it responds.
Sign-ups and traffic cluster in regions or proxy networks that do not match where your actual customers live and work.

Any one of these on its own might be harmless. A new power user really can be enthusiastic, and a research team really can ask broad questions. The concern grows when several of these signals show up together and persist, which is the fingerprint of someone building a dataset rather than using a product.

How to Protect Your LLM from Extraction

You cannot make extraction impossible because a model that refuses to answer questions is not one anyone wants to pay for. What you can do is make copying your model slow, expensive, and risky enough that it no longer makes sense. The most effective approach stacks several defenses together, so that getting past one still leaves an attacker facing the next.

Rate limiting is the sensible first layer, and it works best when it watches behavior rather than raw volume alone. Simple caps on requests per minute help, but smarter limits also flag accounts whose query patterns look like systematic mapping, then slow them down or add friction before they can collect very much.
Thoughtful output design is the quietest layer and the one most often overlooked. The more detail your model hands back with every answer, including granular confidence scores and verbose internal reasoning, the fewer questions an attacker needs to reconstruct it. Returning only what each use case genuinely requires gives away less with every response.
Watermarking adds a layer of proof. By embedding subtle statistical signatures into your model’s outputs, you create a way to recognize your own fingerprints later on. If a competitor’s model turns out to carry them, you hold real evidence that it was trained on your responses, which matters enormously if the dispute ever reaches lawyers.
Your terms of service form the final, legal layer. Clear language that prohibits using your outputs to train competing models turns a quiet technical act into a contract violation you can act on.

Building all of this into a product takes planning, and it is the kind of work our artificial intelligence development services handle alongside the model itself, so security becomes part of the design rather than a patch applied later.

Is It Legal to Distill Another Company's AI Model?

This is the question that makes the topic genuinely tricky, and the honest answer is that it depends. Distillation as a technique is completely legal and widely used, including by the very companies that complain about it. The problem is rarely the method. It comes down to how the data was obtained and what rules were agreed to along the way.

Most commercial AI providers write terms of service that forbid the use of their outputs to build competing models. When a company ignores that clause and distills the model anyway, the issue becomes a breach of contract, and, depending on the circumstances, it can also implicate trade secret protection and unfair competition law. The dispute between OpenAI and DeepSeek, still unresolved as of this writing, centers on these questions rather than on the act of distillation itself.

For you as a model owner, the practical takeaway is straightforward. Strong, explicit terms of service will not physically stop an attacker, yet they give you the legal standing to respond when watermarks or logs reveal what happened. The law in this area is still taking shape, so the companies that clearly document their protections today will be in a far stronger position tomorrow.

Model distillation attack: copying a proprietary AI model through its public API

Responsible Model Distillation Done the Right Way

It would be a shame to walk away from this thinking a model distillation attack is something to fear. The technique that powers these attacks is the same one that lets you build a lighter, cheaper, faster version of a model you legitimately own. The difference between the cautionary tale and the success story comes down to consent and ownership.

Responsible distillation rests on a few clear principles:

You distill from a model you have the right to use, whether that is your own system or one whose provider has explicitly permitted it.
You train on data you own or have properly licensed.
You respect the terms of service attached to any model involved, rather than treating them as an obstacle to route around.

When followed honestly, distillation becomes a real engineering advantage, and we explore its performance side in our guide to large language model inference optimization techniques.

This is the side of the work we care about most. Whether we are helping you compress your own model into something cheaper to run or building defenses so nobody can quietly copy what you have made, the goal stays the same. We treat your model as the valuable asset it is. Teams that come to us for AI agent development often find that proper distillation yields cleaner, more maintainable results, with the welcome side benefit of protecting their work.

However, you do need to remember that a model distillation attack is a quiet kind of theft. There is no dramatic breach to point to, just a slow leak in the capability you spent real money and time creating. If you are exposing a model you care about, or you want to use distillation the right way to build something leaner, we would love to help. Our model mistillation services cover both sides of that coin, protecting what you have built and responsibly building what you need. So give us a call, and let’s discuss the best approach for you.

FAQ

What is a model distillation attack in AI?

A model distillation attack occurs when someone repeatedly queries your AI model via its public API, records the responses, and uses those question-and-answer pairs to train a competing model that imitates yours. No system is breached. The attacker simply uses your model the way any customer would, only at scale and with the goal of copying it.

Can someone copy my AI model through the API?

Yes, at least to a meaningful degree. An attacker cannot lift your exact code or weights through the API, but they can approximate your model’s behavior closely enough to launch a rival product. The fidelity of the copy depends on how many queries they send and how much detail your responses give away.

How do I protect my LLM from extraction?

Combine several defenses rather than relying on one.

Use behavior-aware rate limiting
Return only the level of detail each use case truly needs
Watermark your outputs so you can prove theft later
Write terms of service that explicitly ban training competing models on your responses.

Layered together, these make extraction slow, costly, and legally risky.

Is it legal to distill another company's model?

Distillation itself is legal and common. The trouble starts when a company distills a model in violation of its provider’s terms of service or misuses data it has no right to. In those cases, the matter can become a breach of contract, and sometimes a trade secret or unfair competition issue. The legal landscape is still developing, as the ongoing dispute between OpenAI and DeepSeek shows.

What is the difference between model distillation and model extraction?

They describe the same threat from slightly different angles. Model extraction is the broad term for stealing a model’s behavior by querying its API, while a model distillation attack specifically refers to using the harvested responses to train a smaller student model that mimics the original model. In everyday conversation, the two terms are used interchangeably.