top of page

Knowledge Distillation: Is it the Future?

Hrithvika Singh

7 Feb 2025

Chat GPT Vs DeepSeek

There is a buzz in the industry that DeepSeek, the new AI superhero, has created its large language model using knowledge distilled from Open AI’s model as its output is on par with ChatGPT, at a fraction of the cost.


What is distillation?


Model distillation is a common machine learning technique in which a smaller “student model” is trained on predictions of a larger and more complex “teacher model”.

When completed, the student may be nearly as good as the teacher but will represent the teacher’s knowledge more effectively and compactly.

To do so, it is not necessary to access the inner workings of the teacher. All one needs to pull off this trick is to ask the teacher model enough questions to train the student.

This is what OpenAI claims DeepSeek has done: queried OpenAI’s o1 at a massive scale and used the observed outputs to train DeepSeek’s own, more efficient models.

Quoted in NewYork Times Newspaper, ChatGPT said, “We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more. We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here.”

However, there is currently no method to prove this claim of distillation. One method that is in the early stages of development is watermarking AI output, similar to those applied to copyrighted images.

 

OpenAI’s terms of use explicitly state nobody may use its AI models to develop competing products. However, its own models are trained on massive datasets scraped from the web containing copyrighted material. Currently several lawsuits against OpenAI is underway on the grounds of copyright infringement.

DeepSeek’s rise certainly marks new territory for building models more cheaply and efficiently. Perhaps it will also shake up the global conversation on how AI companies should collect and use their training data.

Industry observers feel while distillation approach raises ethical and legal concerns, the overall intent should guide how these models should be used. Some are even of the view that knowledge distillation is a game changer, allowing small models to retain intelligence of larger ones, while being faster and cost efficient.

While knowledge distillation offers a promising way forward for building LLMs efficiently, industry leaders feel one should be cautious as it is important to understand legal and ethical challenges.


Knowledge Distillation cannot capture the essence of India, feel many. An Indian LLM should capture the teachings of Veda, Puranas, Upanishads to showcase rich culture and heritage.

bottom of page