CommunityNews

CommunityNews

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at GitHub - deepseek-ai/DeepSeek-V3.

Read in full here:

Where Next?

Popular General Dev topics Top

First poster: mafinar
F# Is The Best Coding Language Today. If you want to personally pick up a programming language in order to become a better coder in what...
New
First poster: AstonJ
:tada: Launching Fig I am excited to announce that, as of today, Fig is generally available to the public for download. With our public ...
New
First poster: OvermindDL1
You can now buy a 100W USB-C cable with a built-in power meter. They’re just $20 on Amazon, and they work!
New
First poster: gulshan212
Why Python keeps growing, explained | The GitHub Blog. A deep dive into why more people are using Python than ever, its key use cases, a...
New
First poster: KnowledgeIsPower
Building a Slack/Discord alternative with Tauri/Rust linen <span class="hashtag-icon-placeholder"></span>blog. Introduction My name is K...
New
First poster: dyowee
A Go package for building Progressive Web Apps. A package for building progressive web apps (PWA) with the Go programming language (Gola...
New
First poster: alvinkatojr
Over the last decade, we’ve seen great advancements in distributed systems, but the way we program them has seen few fundamental improvem...
New
CommunityNews
After switching from Firefox to LibreWolf, I became interested in the idea of self-hosting my own Firefox Sync server. Although I had see...
New
First poster: dyowee
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...
New
CommunityNews
Rendering Action Mailer emails with Phlex components and layouts: Clean, Composable, and Completely Ruby - Blog post by Camillo Visini
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1041 20422 388
New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
New
PragmaticBookshelf
Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
New
PragmaticBookshelf
Get the comprehensive, insider information you need for Rails 8 with the new edition of this award-winning classic. Sam Ruby @rubys ...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New
Margaret
Ask Me Anything with Mark Volkmann @mvolkmann On February 24 and 25, we are giving you a chance to ask questions of PragProg author M...
New