IT Tech and Media

AI Module Abstract

Sub - Category : IT Services

Report ID : 240700014

Formatted :docxpdfdocxpdfdocx

Asia, Europe, North America

No. of Pages 40

Published By: Research Gate

License type : Cloud access, PDF, Read mode

Publish Date : Jul 2024

Description

Artificial intelligence (AI) isa technology that enables machines to simulate human intelligence.It can perform tasks such as learning, problem solving, and decision making.

Overview

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. AIME 2024 (Pass@1) Codeforces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-

Content

1 Introduction
1.1 Contributions
1.2 Summary of Evaluation Results
2 Approach
2.1 Overview
2.2 DeepSeek-R1-Zero: Reinforcement Learning on the Base Model
2.2.1 Reinforcement Learning Algorithm
2.2.2 Reward Modeling
2.2.3 Training Template
2.2.4 Performance, Self-evolution Process and Aha Moment of DeepSeek-R1-Zero
2.3 DeepSeek-R1: Reinforcement Learning with Cold Start . . . . . . . . . . . .. . . . . . . . . . . .. . . 9
2.3.1 Cold Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . ... . . .. 9
2.3.2 Reasoning-oriented Reinforcement Learning . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .10
2.3.3 Rejection Sampling and Supervised Fine-Tuning . . . . . . . . . .. . . . . . . . . . . .. . . .10
2.3.4 Reinforcement Learning for all Scenarios . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .... 11
2.4 Distillation: Empower Small Models with Reasoning Capability . . . . .. . . . . . . . . .. . .. ..11
3 Experiment. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . .11
3.1 DeepSeek-R1 Evaluation . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . .. 13
3.2 Distilled Model Evaluation . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 14
4 Discussion. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . .14
4.1 Distillation v.s. Reinforcement Learning . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . .. . . . 14
4.2 Unsuccessful Attempts . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . 15
5 Conclusion, Limitations, and Future Work . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . 16

DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on and Llama. AIME 2024 (Pass@1) Code forces (Percentile) GPQA Diamond (Pass@1) MATH-500 (Pass@1) MMLU (Pass@1) SWE-bench Verified (Resolved) 0 20 40 60 80 100 Accuracy / Percentile (%) 79.8 96.3 71.5 97.3 90.8 49.2 79.2 96.6 75.7 96.4 91.8 48.9 72.6 90.6 62.1 94.3 87.4 36.8 63.6 93.4 60.0 90.0 85.2 41.6 39.2 58.7 59.1 90.2 88.5 42.0 DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini -

Form for Brochure / Sample Report Request

Enter your details to download the sample report.

What you get?

DeepSeek-R1-Zero to attain robust reasoning capabilities without the need for any supervised fine-tuning data. This is a noteworthy achievement, as it underscores the models ability to learn and generalize effectively through RL alone. Additionally, the performance of DeepSeekR1-Zero can be further augmented through the application of majority voting. For example, when majority voting is employed on the AIME benchmark, DeepSeek-R1-Zeros performance escalates from 71.0% to 86.7%, thereby exceeding the performance of OpenAI-o1-0912. The ability of DeepSeek-R1-Zero to achieve such competitive performance, both with and without majority voting, highlights its strong foundational capabilities and its potential for further advancements in reasoning tasks.

Chat with TGMR

Welcome! I'm here to help you find the perfect market research report. Please enter your details to get started.