TLDR: DeepSeek, a Chinese AI company, is challenging the AI landscape with its powerful and affordable DeepSeek-R1 model. It rivals or surpasses OpenAI’s models in areas like math, coding, and reasoning while costing significantly less to train and use. DeepSeek employs techniques like Mixture of Experts and Reinforcement Learning for efficiency and also offers smaller versions for regular PCs. Its accessible API and local PC capabilities make AI more available to small & medium businesses.
DeepSeek, a Chinese AI company, is making significant strides in the AI world with its DeepSeek-R1 model. This model has garnered attention in the tech industry, signaling DeepSeek’s emergence as a strong competitor by providing powerful AI at a lower price. This will explore the factors that make DeepSeek special and its potential to broaden AI accessibility, particularly for small and medium-sized businesses.
DeepSeek is renowned for developing the DeepSeek-R1 model, which utilizes the DeepSeek-V3 base model. Access to their AI is facilitated through their website, an app, and an API, enabling other programs to integrate their AI. In contrast to some other companies, DeepSeek openly shares its model designs and research findings with the public.
DeepSeek employs several innovative methods to enhance the efficiency of its models:
- Mixture of Experts: The models utilize multiple smaller models, each specializing in a particular skill, ensuring the appropriate model is applied to each task.
- Thinking Time: The models allocate more time to “thinking” about complex problems, leading to improved solutions.
- Smaller Models: DeepSeek creates smaller, more accessible versions of its models capable of running on standard PCs with RTX AI.
- Reinforcement Learning (RL): DeepSeek uses large-scale reinforcement learning (RL) to enable its models to solve problems using better reasoning. Notably, DeepSeek-R1-Zero was trained via RL without supervised fine-tuning (SFT) as a preliminary step. DeepSeek-R1 incorporates multi-stage training and cold-start data before RL to improve readability and address language mixing.
- Distillation: DeepSeek distills the reasoning patterns of larger models into smaller ones, such as the Qwen and Llama series, resulting in exceptional performance on benchmarks. For example, DeepSeek-R1-Distill-Qwen-7B achieves 55.5% on AIME 2024, surpassing QwQ-32B-Preview.
DeepSeek’s performance is comparable to or surpasses OpenAI’s o1 models in domains like math, coding, and reasoning. Its smaller models, such as the 32B and 70B parameter versions, rival or outperform OpenAI’s o1-mini. Specific benchmark results include:

- AIME 2024 (Pass@1): DeepSeek-R1 scored 79.8%, slightly better than OpenAI-o1-1217.
- MATH-500 (Pass@1): DeepSeek-R1 achieved an impressive 97.3%, on par with OpenAI-o1-1217.
- Codeforces (Percentile): DeepSeek-R1 exhibits expert-level coding proficiency, outperforming 96.3% of human participants.
- The DeepSeek-V3 model has 671 billion parameters, can process up to 3,872 tokens per second, and can handle 128,000 tokens at once.
- DeepSeek used about 2,000 Nvidia H800 GPUs to train their models.
The cost savings are substantial: DeepSeek’s primary model cost less than $6 million to train, while GPT-4’s training exceeded $100 million. Additionally, DeepSeek’s API is significantly more economical at $0.14 per million input tokens, compared to OpenAI’s $2.50.
DeepSeek is especially beneficial for smaller businesses:

- Cost-Effective: It offers a more affordable AI solution for small and medium-sized businesses.
- Data Security: Local PC operation enhances data security.
- Innovation: Reduced costs promote experimentation and innovation.
- Accessibility: AI becomes accessible to businesses with limited resources.
- Easy Transition: It’s straightforward to integrate DeepSeek into existing AI workflows.
DeepSeek’s arrival is a significant development for the AI landscape, fostering increased competition. However, there are also limitations:
DeepSeek-R1’s capabilities fall short of DeepSeek-V3 in tasks like function calling, multi-turn interactions, complex role-playing, and JSON output. It also encounters language mixing issues when handling queries outside of Chinese and English. Future plans include improving general capabilities, addressing language mixing, and enhancing performance on software engineering tasks.
DeepSeek’s success underscores the importance of efficiency and innovation, paving the way for a future where AI is more accessible. Furthermore, DeepSeek open-sources DeepSeek-R1-Zero, DeepSeek-R1, and several distilled dense models, contributing to the research community.
Resources: DeepSeek Research Paper, DeepSeek Website.