Post-training

Most people think the magic of AI happens during the initial “pre-training” phase, where a model ingests a huge chunk of the internet. They’re wrong. The most critical and legally dangerous part of AI development happens after that. We call it post-training.

If pre-training is like a chef learning every recipe in a giant culinary library, post-training is when they’re hired to cook for a specific restaurant. This is where the chef stops being a generalist and starts embodying a specific brand. They are taught the restaurant’s secret sauces, its presentation style, and, most importantly, what they are absolutely forbidden from putting on the plate.

This is where the real liability is created. And just like with a new chef, it’s where a single mistake can lead to a lawsuit.

There are two main parts to post-training:

1. Finetuning: Teaching the Chef a Secret, Poisonous Recipe

Finetuning is the process of taking the generalist model and training it further on a small, specific dataset to make it an expert. For example, a bank might finetune a model on its confidential customer data to create a personalized financial advisor.

The Legal Risk: This process is like teaching the chef a secret recipe that happens to be poisonous if served to the wrong person. The model now has specialized knowledge of your most sensitive data. If an attacker can trick the model into “serving” that information, they won’t just get a generic dish; they’ll get the poisoned recipe.

Furthermore, if a customer gives you their data to create a custom finetuned model for them, you have a data privacy nightmare on your hands. How do you guarantee that their confidential data doesn’t leak into the model’s behavior or, even worse, get used to train another customer’s model? Without strong data isolation and a proven ability to “unlearn” that data, you’re facing a breach of contract and privacy laws.

2. Alignment: A “Don’t Poison the Customers” Sign in the Kitchen

Alignment is the process of making the model “safe” and “helpful.” This is usually done with Reinforcement Learning from Human Feedback (RLHF), where humans rank the model’s outputs to teach it what is “good” and “bad.” This creates the guardrails that are supposed to stop the model from generating harmful, toxic, or illegal content.

The Legal Risk: These guardrails are not objective rules of law; they are the subjective opinions of a group of low-paid contractors. It’s like putting a sign in the kitchen that says “Don’t Poison the Customers” and hoping for the best. The instructions given to those contractors—the real “rules” of the kitchen—are a goldmine for a litigator.

More importantly, these guardrails are brittle. We spend our time finding ways to “jailbreak” these models—to craft prompts that bypass the safety training and trick the model into doing what it was told not to. It’s often trivially easy. A company’s claim that its model is “safe” because of alignment should be treated with extreme skepticism. It’s not a fortress; it’s a picket fence with the gate left open.

When you’re looking at an AI company, don’t just ask what’s in its massive training dataset. Ask for the records of its post-training.

  • What specific data was used to finetune the model?
  • What were the exact instructions given to the human feedback raters?
  • What is their process for data isolation between customers?
  • How do they prove they can make the model “forget” a piece of data?

Post-training is where intent is demonstrated. It’s the difference between building a tool and setting a trap. For a lawyer, it’s where you’ll find the evidence.