In terms of performance, DeepSeek-R1 reasoning models compete with OpenAI.

DeepSeek-R1 reasoning models compete with OpenAI

Mukesh Sahu
Spread the love

The first-generation DeepSeek-R1 and DeepSeek-R1-Zero models, developed by DeepSeek, are intended to handle challenging reasoning problems.

DeepSeek-R1-Zero does not use supervised fine-tuning (SFT) as a precondition; instead, it is trained only using large-scale reinforcement learning (RL). DeepSeek claims that this method has caused “numerous powerful and interesting reasoning behaviours,” such as self-verification, contemplation, and the creation of lengthy chains of thought (CoT), to naturally arise.

The researchers at DeepSeek noted that [DeepSeek-R1-Zero] is the first open study to confirm that LLMs’ reasoning abilities may be encouraged solely through RL, without the necessity for SFT. This milestone opens the door for RL-focused developments in reasoning AI while also highlighting the model’s creative underpinnings.

However, there are several restrictions on DeepSeek-R1-Zero’s capabilities. “Endless repetition, poor readability, and language mixing” are among the main issues that could cause major problems in practical implementations. DeepSeek created its flagship model, DeepSeek-R1, to overcome these issues.

Let’s introduce DeepSeek-R1.
In order to improve on its predecessor, DeepSeek-R1 uses cold-start data before RL training. This extra pre-training phase improves the model’s capacity for reasoning and addresses a number of DeepSeek-R1-Zero’s drawbacks.

Notably, DeepSeek-R1 establishes itself as a top competitor by achieving performance on coding, general thinking, and mathematics tasks that is on par with OpenAI’s highly acclaimed o1 system.

Along with six smaller distilled models, DeepSeek has decided to make both DeepSeek-R1-Zero and DeepSeek-R1 open-source. Among these, DeepSeek-R1-Distill-Qwen-32B has shown remarkable performance, surpassing OpenAI’s o1-mini on several benchmarks.

MATH-500 (Pass@1): DeepSeek-R1 outperformed OpenAI (96.4%) and other significant rivals with a score of 97.3%.
LiveCodeBench (Pass@1-COT): Among smaller models, the distilled version DeepSeek-R1-Distill-Qwen-32B performed exceptionally well, scoring 57.2%.
AIME 2024 (Pass@1): With a score of 79.8%, DeepSeek-R1 established a remarkable benchmark for solving mathematical puzzles.

Share This Article
Follow:
I am a Software Engineer and the Founder of mcaEducation4all.
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *