xAl's Mind Blowing Grok 4 Demo Elon Musk 听译文
at 北京时间 2025年07月10日 11:00AM
脑图
核心主题:Grok 4 是 X AI 推出的最强大 AI 模型,具备超人类级推理能力、多模态处理能力及与现实世界交互的潜力,标志着通用人工智能(AGI)发展的重要里程碑。
学术与专业领域表现:
推理与工具使用:
语音与多模态能力:
训练规模与效率:
强化学习与现实闭环:
多代理协作:
商业与科研:
创意与娱乐:
开发工具:
技术路线图:
AI 安全与价值观:
商业化进展:
经济潜力:
挑战与风险:
In a world where knowledge shapes destiny, 1 creation dares to redefine the future. From the mines at X AI. Prepare for GC 4 this summer. The next generation arrives faster, smarter, bolder. It sees beyond the horizon, answers the unasked, and challenges the impossible g.k. 4 unleash the truth coming this summer.
在一个知识塑造命运的世界里,一个造物敢于重新定义未来。来自X AI的矿井。为今年夏天的GC 4做准备。下一代更快、更聪明、更大胆。它超越地平线,回答未被问到的问题,并挑战不可能的g.k。 4释放今年夏天即将到来的真相。
All right, welcome to the GC 4 release here. This is the smartest AI in the world. We're going to show you exactly how and why. And it really is remarkable to see the advancement of artificial intelligence, how quickly it is evolving. I sometimes think compare it to the growth of a human and how fast a human learns and gains conscious awareness and understanding. AI is advancing just vastly faster than any human.
好的,欢迎来到这里的gc4版本。这是世界上最聪明的AI。我们将向您展示如何以及为什么。看到人工智能的进步,它的发展速度真的非常惊人。我有时会想,将其与人类的成长以及人类学习和获得有意识的意识和理解的速度进行比较。AI的发展速度比任何人类都快得多。
We're going to take you through a bunch of benchmarks that Grok 4 is able to achieve incredible numbers on.
我们将带您通过一系列基准测试,来帮助Grok 4实现令人难以置信的数字。
But it's actually worth noting groc 4, given like the Sat, would get perfect Sats every time, even if it's never seen the questions before. And if even going beyond that to say graduate student exams like the GRE, it will get near perfect results in every discipline of education. So from the humanities to like languages, math, physics, engineering, pick anything. And we're talking about questions that it's never seen before. These are not on the internet. And GC 4 is smarter than almost all graduate students in all disciplines simultaneously. It's actually just important to appreciate that that's really something. And the reasoning capabilities of groc are incredible. So there's some people out there who think AI can't reason, and it can reason at superhuman level.
但实际上值得注意的是,groc 4,就像Sat一样,每次都会得到完美的Sat,即使它以前从未见过这些问题。甚至超越像GRE这样的研究生考试,它将在每个教育学科中获得近乎完美的结果。所以从人文到语言、数学、物理、工程,都可以挑选任何东西。我们正在谈论以前从未见过的问题。这些不在互联网上。Gc4比所有学科的几乎所有研究生都聪明。实际上,重要的是要欣赏这真的很重要。groc的推理能力令人难以置信。所以有些人认为AI不能推理,它可以在超人的层面上推理。
Yeah, and frankly, it only gets better from here. We'll take you through the GC for release. And yeah, Jerry, the pace of progress here, I guess the first part is like in terms going from Grok 2 to Grok 3 to Grok 4, we've essentially increased the training by an order of magnitude in each case. So it's 100 times more training than groc 2. And that's only gonna increase. So it's, yeah, frankly, I don't know, in some ways a little terrifying, but the growth of intelligence here is remarkable.
是的,坦率地说,它只会从这里变得更好。我们将带您通过GC进行发布。是的,杰瑞,这里的进展速度,我想第一部分就像是从Grok 2到Grok 3到Grok 4,我们在每种情况下都基本上增加了一个数量级的训练。所以它的训练时间是groc 2的100倍。而且这只会增加。所以,坦率地说,我不知道,在某些方面有点可怕,但这里的智力增长是显著的。
It's important to realize there are two types of training compute. One is the pretrained model. That's from graph 2 to graph 3. But for from graph 3 to graph 4, we're actually putting a lot of compute in reasoning in RL.
重要的是要意识到有两种类型的培训计算。一个是预训练模型。这是从图2到图3。但是从图3到图4,我们实际上在RL的推理中投入了大量的计算。
just like you said, this is literally the fast moving field, and grac 2 is like the high school student by today standard. If you look bad in the last 12 months, grac 2 was only a concept. We didn't even have grac 2 12 months ago.
就像你说的,这是一个快速发展的领域,而格拉克2就像今天的高中生一样。如果你在过去的12个月里看起来很糟糕,那么2格拉克只是一个概念。12个月前我们甚至都没有2岁。
And then by training ground 2 that the first time we scale up like the pre training, we realized that if you actually do the data ablation really carefully and inflow and also the algorithm, we can actually push the pretrained a lot by amount of 10x to make a model the best pretrained model. And that's why we build clauses, the world's supercomputer with 100 H-100, and then with the best pre trainin model. And we realize if you can collect this verifiable outcome reward, you can actually train this model to start thinking from the first principle of the reason, correct its own mistakes. And that's where the graphic reasoning comes from. And today we ask the question, what happens if you take the expansion of clauses with all 200 Gpus, put all these into oil, 10x more compute than any of the models out there on reinforcement learning, unprecedented scale, What's going to happen? This is the story of G4, and Tony shares some insight with the audience.
然后通过训练场地2,即我们第一次像预训练一样扩大规模时,我们意识到如果你真的非常仔细地进行数据消融、流入和算法,我们实际上可以将预训练的模型推10倍,使模型成为最佳的预训练模型。这就是为什么我们构建子句,世界上有100个H-100的超级计算机,然后使用最好的预训练模型。我们意识到,如果你能够收集这个可验证的结果奖励,你实际上可以训练这个模型从推理的第一原理开始思考,纠正自己的错误。这就是图形推理的来源。今天我们问一个问题,如果你将所有200个Gpus的子句进行扩展,将所有这些放入油中,计算能力比强化学习任何一个前所未有的模型高出10倍,会发生什么?这就是G4的故事,Tony与观众分享了一些见解。
So yeah, let's just talk about how Smart Graph 4 is. So I guess we can start discussing this benchmark called Humanity's exam. And this benchmark is a very challenging benchmark. Every single problem is curated by subject matter experts. It's in total 2500 problems, and it consists of many different subjects, mathematics, natural sciences, engineering, and also all of humanity subjects. Essentially, when it was first released, actually like earlier this year, most of the models out there can only get single digit accuracy on this benchmark.
所以,让我们来谈谈图4有多聪明。所以我想我们可以开始讨论这个被称为人类考试的基准。这个基准是一个非常具有挑战性的基准。每个问题都由主题专家策划。它总共有2500个问题,包括许多不同的学科,数学、自然科学、工程以及所有人文学科。基本上,当它第一次发布时,实际上就像今年早些时候一样,大多数模型在这个基准上只能获得个位数的准确性。
Yeah, so we can look at some of those examples. So there, there is this mathematical problem, which is about natural transformations in category theory, and there's this organic chemistry problem that talks about electro cycling reactions. And also there's this linguistic problem that tries to ask you about this distinguishing between closed and open syllabus from a Hebrew source text. So you can see also it's a very wide range of problems. And every single problem is Phd or even advanced research level problems.
是的,我们可以看看其中的一些例子。所以,这里有一个数学问题,关于范畴理论中的自然变换,还有一个关于电循环反应的有机化学问题。还有一个语言学问题,试图问你关于区分封闭和开放教学大纲与希伯来语源文本的区别。所以你也可以看到这是一个非常广泛的问题。每一个问题都是博士甚至是高级研究水平的问题。
There are no humans that can actually answer These can get a good score. If you actually say like any given human, like what's the best that any human could score? I'd say maybe 5% optimistically. So this is much harder than what any human can do.
没有人类能够真正回答这些问题,可以得到一个好的分数。如果你真的像任何给定的人类一样说,任何人类可以得分的最好的是什么?我会乐观地说5%。所以这比任何人都能做的都要困难得多。
It's incredibly difficult. And you can see from the types of questions like you might be incredible in linguistics or mathematics or chemistry or physics, any one of a number of subjects. But you're not going to be at a post grad level in everything. And Grok 4 is a postgrad level in everything. It just some of these things are just worth repeating, like grok poor, is postgraduate, like Phd level in everything better than Ph, but like most PhDs would fail, So it's better said at least with respect to academic questions, I want to just emphasize this point with respect to academic questions.
这非常困难。你可以从各种类型的问题中看出,你可能在语言学、数学、化学或物理学等多个学科中任何一个都不可思议。但你不会在所有方面都达到研究生水平。而Grok 4是一切的研究生水平。这只是其中一些事情值得重复,像grok可怜,研究生,像博士水平在所有方面都比博士更好,但像大多数博士都会失败,所以至少在学术问题方面说得更好,我想强调这一点关于学术问题。
G4 is better than Phd level in every subject, no exceptions.
G4在每个学科都比博士水平好,没有例外。
Now, this doesn't mean that it times it may lack common sense and it has not yet invented new technologies or discovered new physics, but that is just a matter of time if it think it may discover new technologies as soon as later this year. And I would be shocked if it has not done so next year. So I gro to literally discover new technologies that are actually useful no later than next year and maybe end of this year, and it might discover new physics next year and within two years I'd say almost certainly so just let that sink in.
现在,这并不意味着它可能缺乏共同的感知,也没有发明新技术或发现新物理学,但如果它认为它可能在今年晚些时候发现新技术,这只是时间问题。如果明年没有这样做,我会感到震惊。所以我需要在明年甚至今年年底之前真正发现有用的新技术,明年可能会发现新的物理学,我想说几乎可以肯定的是,让这一点沉入其中。
Yeah, okay, so I guess we can talk about the what's behind the scene of about 4. As Jimmy mentioned, we actually saw in a lot of compute into this training when it started. It's only also a single digit. Sorry, the previous slide, sorry, Yeah, it's only a single digit number, but as you start putting in more and more training compute, it started to gradually become smarter and smarter and eventually solve a quarter of the HR problems. And this is without any tools.
是的,好的,我想我们可以谈谈大约4背后的幕后故事。正如Jimmy所提到的,我们实际上在训练开始时就看到了大量的计算。它也只有一个数字。抱歉,上一张幻灯片,抱歉,是的,它只是一个单一数字,但随着你开始投入越来越多的培训计算,它开始逐渐变得越来越聪明,最终解决了四分之一的人力资源问题。这没有任何工具。
The next thing we did was to adding a to capability to the model. And unlike Graph 3, I think graph 3 actually is able to use C as well, but here we actually make it more native in the sense that we put the tools into training. Gro was only relying on generalization. Here we actually put the tools into training, and it turns out this significantly improves the model's capability of using those tools.
我们接下来做的是向模型中添加一个功能。与图3不同的是,我认为图3实际上也可以使用C,但在这里,我们实际上使它更加本地化,感知我们将工具放入训练中。我们只依赖于泛化。在这里,我们实际上将工具放入训练中,事实证明,这显著提高了模型使用这些工具的能力。
I remember we had like a deep search .
我记得我们进行了深入的搜索。
back.
回来。
so how is this different?
那么这有什么不同呢?
Yeah, exactly, so deep search was exactly the graph 3 reasoning model without any specific training. But we only asked you to use those tools. So compared to this, it was much weaker in terms of its tool capability .
是的,确切地说,深度搜索正是图3推理模型,没有任何特定的训练。但我们只要求您使用这些工具。因此,与之相比,它在工具能力方面要弱得多。
and reliable .
可靠的。
and unreliable. Yes, yes.
而且不可靠。是的,是的。
and to be clear, these are still I'd say this is still fairly primitive tool used if you compare to, say, the tools that are used at Tesla or SpaceX, you're using finite element analysis and computational flow dynamics, and you're able to run or say Tesla is crash simulations with the simulations are so close to reality that if the test doesn't match the simulation, you assume that the test article is wrong. That's how good the simulations are. So Croc is not currently using any of the tools. They're really powerful tools that a company would use, but that is something that we will provide it with later this year. So we will have the tools that a company has and have very accurate physics simulator.
需要明确的是,如果与特斯拉或SpaceX使用的有限元分析和计算流动力学工具相比,我会说这仍然是相当原始的工具。并且你能够运行或者说特斯拉是碰撞模拟,模拟与现实非常接近,如果测试与模拟不符,你就会假设测试物品是错误的。这就是模拟有多好。所以鳄鱼目前没有使用任何工具。它们是公司会使用的非常强大的工具,但这是我们今年晚些时候将提供的东西。所以我们将拥有公司拥有的工具和非常精确的物理模拟器。
Ultimately, the thing that will make the biggest difference is being able to interact with with the real world via humanoid robots. So you can with Optimus and it can actually interact with the real world and figure out. Formula and hypothesis. And then confirm if that hypothesis is true or not.
最终,最大的区别将是能够通过人形机器人与现实世界互动。所以你可以使用擎天柱,它可以与现实世界互动并找出答案。公式和假设。然后确认这个假设是否正确。
We're really think about like where we are today. We're at the beginning of an immense intelligence explosion. We're in the intelligence big bang right now, and we're at the most interesting time to be alive of any time in history. Yeah, now that said, we need to make sure that the AI is a good AI, good groc. And the thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth seeking. So this is a very fundamental, you can think of AI as the super-genius child that ultimately will outsmart you. But you can, you can instill the right values, encourage it to be truthful, honorable, good things like the value one instill in a child that that would ultimately grow up to be incredibly powerful.
我们真的在思考我们今天所处的位置。我们正处于一个巨大的智能爆炸的开端。我们现在正处于智能大爆炸中,我们正处于历史上最有趣的时刻。是的,既然如此,我们需要确保AI是一个好的AI,一个好的groc。我认为对人工智能安全最重要的事情是,至少我的生物神经网络告诉我,对人工智能最重要的事情是最大限度地寻求真理。所以这是非常基本的,你可以把AI看作是超级天才的孩子,最终会比你聪明。但是你可以,你可以灌输正确的价值观,鼓励它是真实的,光荣的,好的东西,就像一个灌输给孩子的价值观最终会成长为非常强大的东西。
Yeah, so this is really, I say, we say tools, these are still primitive tools, not the kind of tools that serious commercial companies use, but we will provide a with those tools and I think it will be able to solve with those tools real world technology problems. In fact I'm certain of it. It's just a question of how long it takes .
是的,所以这真的是,我说,我们说工具,这些仍然是原始的工具,不是严肃的商业公司使用的那种工具,但我们将提供这些工具,我认为它将能够用这些工具解决现实世界的技术问题。事实上,我很确定。问题只是需要多长时间。
exactly.
完全正确。
So is it just compute all you need to, is it just compute all you need at this point?
那么它只是计算你需要的一切,还是只是计算你现在需要的一切?
You need compute plus the right tools. And then ultimately, to be able to interact with the physical world and then will effectively have an economy that is ultimately an economy that is thousands of times bigger than our current economy, or maybe millions of times.
你需要计算和正确的工具。然后最终,能够与物理世界互动,然后将有效地拥有一个最终比我们目前的经济大数千倍,或者数百万倍的经济。
If you think of civilization as percentage completion of the kardash kev scale, where kardashev 1 is using all the energy output of a planet and car Che 2 is using all the energy output of a sun, and 3 is all the energy output of a galaxy, in my opinion, probably closer to 1% of kardashev have one, then we got to 10%. So like maybe 1, 1 or 2% of kardashev ones, we will get to most of the way like 80, 90% kardashev 1. And then hopefully if civilization doesn't self annihilate and then kardashev two, like the actual notion of a human economy, assuming civilization continues to progress, will seem very quaint in retrospect. It will seem sort of cavemen throwing sticks into a fire level of economy compared to what the future will hold. It's very exciting.
如果你把文明看作是卡达什·凯夫尺度的百分比完成率,其中卡达什·凯夫1使用的是一颗行星的所有能量输出,车Che 2使用的是太阳的所有能量输出,而3则是一个星系的所有能量输出,在我看来,可能接近1% 的kardashev有一个,那么我们得到了10%。所以可能是1,1或2% 的kardashev,我们会达到80,90% 的kardashev 1。然后,希望如果文明不自我毁灭,那么像人类经济的实际概念一样,假设文明继续进步,回想起来会显得非常古怪。与未来相比,这似乎是穴居人把棍子扔进了火的经济水平。这非常令人兴奋。
I've been at times worried about this. It seems like it's somewhat unnerving to have intelligence created that is far greater than our own. And will this be bad or good for humanity? I think it'll be good. Most likely it'll be good, but I somewhat reconciled myself to the fact that even, even if it wasn't going to be good I'd at least like to be alive to see it happen.
我有时会担心这个问题。似乎创造的智力比我们自己的智力要大得多,这有点令人不安。这对人类是好事还是坏事?我觉得会很好。这很有可能是好的,但我有点接受这样一个事实,即使它不会很好,我至少希望活着看到它发生。
Yeah, I think one technical problem that we still need to solve besides just compute is how do we unblock the data automa? Because when we try to scale up the RL, in this case, we did invent a lot of new techniques, innovations to allow us to figure out how to find a lot of challenging RL problems to work on. It's not just a problem itself needs to be challenging, but also it should be. You also need to have reliable signal to tell the model you did it wrong, you did it right. This is the sort of the principle of reinforcement learning, and as the model gets smarter and smarter, the number of cool problems or challenging problems will be less and less. So it's going to be a new type of challenge that we need to surpass. Besides just compute.
是的,我认为除了计算之外,我们还需要解决的一个技术问题是如何解锁数据自动机?因为当我们尝试扩大RL时,在这种情况下,我们确实发明了许多新技术和创新,使我们能够找出许多具有挑战性的RL问题来解决。这不仅是一个问题本身需要具有挑战性,而且应该是。你还需要有可靠的信号来告诉模型你做错了,你做得对。这就是强化学习原理,随着模型变得越来越智能,酷问题或具有挑战性问题的数量将越来越少。所以这将是我们需要超越的一种新型挑战。除了计算之外。
We're actually running out of actual test questions to ask. So there's like even ridiculously questions that are ridiculously hard, if not essentially impossible for humans that are written down, questions are swiftly becoming trivial for AI, so then the one thing that is an excellent judge of things is reality. Because if physics is the law, ultimately everything else is recommendation. You can't break physics.
我们实际上已经没有实际的测试问题了。因此,即使是荒谬的问题,如果不是对人类来说基本上是不可能的,那么对AI来说,问题很快就变得微不足道了,所以判断事物的最佳方法就是现实。因为如果物理学是定律,最终其他一切都是建议。你不能打破物理。
The ultimate test, I think, for whether an AI is the ultimate reasoning test is reality. So you invent a new technology, improve the design of a car or a rocket, create a new medication that does it work? Does the rocket get to orbit? Does the car drive, does the medicine work? Whatever the case may be, reality is the ultimate judge here. So it's going to be reinforcement learning, closing loop around reality.
我认为,对于AI是否是最终的推理测试的最终测试是现实。所以你发明新技术,改进汽车或火箭的设计,创造一种新的药物,它有效吗?火箭能进入轨道吗?这车开不开,这药有效吗?无论情况如何,现实才是最终的裁判。所以这将是一个强化学习的,围绕现实的闭环。
We ask the question, how do we even go further thinking about now with single agent, we're able to solve 40% of the problem, what if we have multiple agents running at the same time? So this is what's called test and compute. And as we scale up the test on compute, actually we are able to solve almost more than 50% of the text only subset of the hire problems. It's a remarkable and achievement, I think.
我们问这个问题,我们现在如何进一步思考单个代理,我们能够解决40% 的问题,如果我们同时运行多个代理怎么办?这就是所谓的测试和计算。随着我们在计算上扩大测试规模,实际上我们能够解决几乎50% 以上的仅文本子集的雇用问题。我认为这是一个了不起的成就。
you know, this is this is insanely difficult, the these are, it's so we're saying is like a majority of the text based of humanities scarily named humanity's exam graph 4 can solve and you can try it out for yourself. And with the growth 4 heavy, what it does is it spawns multiple agents in parallel and all of those agents do work independently and then they compare their work and they decide which one. It's like a study group. And it's not as simple as majority vote because often only one of the agents actually figures out the trick or figures out the solution. But once they share the trick or figure out what the real nature of the problem is, they share that solution with the other agents, and they essentially compare notes, then yield an answer. So that's the heavy part of GCP is where you scale up the test time, compute by roughly an order of magnitude, have multiple agents tackle the task, and then they compare their work, and they put forward what they think is the best result.
你知道,这是非常困难的,这些是,所以我们说的是,像大多数基于人文学科的文本一样,可怕地命名为人类考试图4可以解决,你可以自己尝试一下。随着增长4的增加,它所做的是并行产生多个代理,所有这些代理独立工作,然后他们比较他们的工作并决定哪个代理。这就像一个学习小组。这并不像多数投票那么简单,因为通常只有一个代理人实际上能找出诀窍或解决方案。但是一旦他们分享诀窍或弄清楚问题的真正本质,他们就会与其他代理人分享这个解决方案,他们基本上会比较笔记,然后产生答案。因此,GCP的重要部分是,您可以扩大测试时间,按大约一个数量级进行计算,让多个代理处理任务,然后他们比较他们的工作,并提出他们认为最好的结果。
So we will introduce graph 4 and graph 4, happy, sorry, can click the next slide, yes, so yes. So basically graph 4 is a single, a single agent version, and graph for heavy is the multi agent version. So let's take a look how they actually do on those exam problems and also with some real life problems.
所以我们将介绍图4和图4,很高兴,抱歉,可以单击下一张幻灯片,是的,所以是的。因此基本上,图4是单个代理版本,而重版本的图是多代理版本。那么让我们来看看他们在这些考试问题上以及在一些现实生活问题上的实际效果。
Yeah, so we're going start out here and we're actually going to look at one of those HLA problems. This is actually one of the easier math ones. I don't really understand it very well. I'm not that smart, but I can launch this job here and we can actually see how it's going to go through and start to think about this problem. While we're doing that, I also want to show a little bit more about what this model can do and launch a Rock 4 Heavy as well.
是的,所以我们要从这里开始,实际上要研究其中一个HLA问题。这实际上是更容易的数学之一。我并不是很了解它。我不是那么聪明,但我可以在这里启动这项工作,我们可以实际看到它将如何经历并开始思考这个问题。在我们这样做的同时,我也想展示更多关于这个模型能做什么,并推出一个摇滚4重的。
So everyone knows Poly Market it, it's extremely interesting. It's the seeker of truth. It aligns with what reality is most of the time. And with Grok, what we're actually looking at is being able to see how we can try to take these markets and see if we can predict the future as well. So as we're letting this run, we'll see how Grok for Heavy goes about predicting the World Series odds for the current teams and the MLP. And while we're waiting for these to process, we're going to pass it over to Eric, and he's going to show you an example of his.
所以每个人都知道保利市场,它非常有趣。它是真理的追求者。它大多数时候与现实相符。通过使用Grok,我们实际看到的是能够看到我们如何尝试抓住这些市场,并看看我们是否也可以预测未来。所以当我们让它运行时,我们将看到Grok for seek如何预测当前团队和MLP的世界系列赔率。当我们等待这些处理时,我们将把它传递给Eric,他将向您展示他的一个例子。
Yeah, I guess one of the coolest things about Grok 4 is its ability to understand the world and to solve hard problems by leveraging tools like Tony discussed. And I think one kind of cool example of this, we asked you to generate a visualization of two black holes colliding.
是的,我想Grok 4最酷的一点就是它能够理解世界,并通过利用像Tony讨论的工具来解决难题。我认为其中一个很酷的例子是,我们要求您生成两个黑洞碰撞的可视化图像。
Of course, there are some liberties. It's, in my case, actually pretty clear in its thinking trace about what these liberties are. For example, in order for it to actually be visible, you need to really exaggerate the scale of the waves. And yeah. So here's this kind of inaction. It exaggerates the scale in multiple ways. It drops off a bit less in terms of amplitude over distance. But yeah, we can see the basic effects that are actually correct. It starts with inspiring all it merges, and then you have the ring down. And this is basically largely correct.
当然,有一些自由。在我的情况下,关于这些自由的思想痕迹实际上非常清楚。例如,为了使其真正可见,您需要真正夸大波浪的规模。是的。所以这里有这种不作为。它以多种方式夸大了规模。就振幅随距离的变化而言,它的下降幅度略小。但是,是的,我们可以看到实际上是正确的基本效果。它从激发所有的灵感开始,然后你把戒指放下。这基本上是正确的。
Yeah, modulo some of the simplifications that need to do. It's actually quite explicit about this, but it is like post Newtonian approximations and of actually computing the general relativistic effects of near the center of the black hole, which is incorrect and will lead to some incorrect results. But the overall visualization is, yeah, it's basically there.
是的,模一些需要做的简化。这实际上非常明确,但这就像后牛顿近似和实际计算黑洞中心附近的一般相对论效应一样,这是不正确的,会导致一些不正确的结果。但整体可视化是,是的,它基本上就在那里。
And you can actually look at the kinds of resources that are references. So here, it actually, it obviously uses search. It gathers results from a bunch of links, but also reads through an undergraduate text in analytical analytic gravitational wave models. It reasons quite a bit about the actual constants that I should use for a realistic simulation. It references, I guess, existing real world data. And, yeah, it's a pretty good model.
你实际上可以查看作为参考的资源种类。所以在这里,实际上,它显然使用了搜索。它从许多链接中收集结果,同时还阅读了分析分析引力波模型的本科文本。它对我应该用于逼真模拟的实际常量进行了相当多的解释。我想它参考了现有的现实世界数据。是的,这是一个相当好的模型。
Yeah, but actually going forward, we can give it the same model that physicists use. So it can run the same level of compute that so leading physics researchers are using and give you a physics accurate black hole simulation.
是的,但实际上在未来,我们可以给它物理学家使用的相同模型。因此,它可以运行与领先的物理研究人员正在使用的相同级别的计算,并为您提供精确的黑洞模拟。
Exactly just right now is running in your browser.
正好现在正在您的浏览器中运行。
So yeah, this is just running in your browser. Exactly, pretty simple.
所以是的,这只是在您的浏览器中运行。完全正确,相当简单。
Swapping back real quick here, we can actually take a look. The math problem is finished. The model was able to, let's look at it, it's thinking trace here, so you can see how it went through the process. I'll be honest with you guys, I really don't quite fully understand the math, but what I do know is that I looked at the answer ahead of time and it did come to the correct answer here in the final part here, we can also come in and actually take a look here at our World Series prediction, and it's still thinking through on this one, but we can actually try some other stuff as well.
在这里快速换回,我们实际上可以看看。数学问题解决了。这个模型能够,让我们看看它,它在这里思考痕迹,所以你可以看到它是如何经历这个过程的。说实话,我真的不太理解这个数学,但我知道的是,我提前看过答案,在这里的最后一部分得出了正确的答案。我们也可以进来看看我们的世界系列预测,它仍在思考这个,但我们实际上也可以尝试一些其他的东西。
So we can actually try some of the X integrations that we did. So we worked very heavily on working with all of our X tools and building out a really great X experience.
因此,我们实际上可以尝试一些我们所做的X集成。因此,我们非常努力地使用我们所有的X工具,构建了一个非常棒的X体验。
So we can actually ask the model, find me the Xa employee that has the weirdest car profile photo. So that's going to go off and start that, and then we can actually try out, let's create a timeline based on x post detailing the changes in the scores over time. And we can see all the conversation that was taking place at that time as well. So we can see who are the announcing scores and what was the reactions at those times as well. So we'll let that go through here and process and if we go back to this was the Greg Yang photo here. If we scroll through here, whoops. So Greg Yang, of course, who has his favorite photograph that he has on his account, that's actually not how it looks like in real life, by the way, just it were.
所以我们实际上可以问模特,找到那个有最奇怪的汽车资料照片的Xa员工。所以这将开始并开始,然后我们可以实际尝试,让我们创建一个基于x个帖子的时间线,详细说明分数随时间的变化。我们可以看到当时发生的所有对话。因此,我们可以看到谁是宣布的分数,以及在那些时间的反应是什么。所以我们会让它在这里和过程中进行,如果我们回到这张是格雷格·杨的照片。如果我们滚动到这里,哎呀。所以,当然,格雷格·杨有他最喜欢的照片,他在自己的账户上,这实际上并不是现实生活中的样子,顺便说一下,只是它是这样的。
but is quite funny. But I had to understand that question, which is that's the wild part. Like it understands what is a weird photo? What is a weird photo? What is a less or weird photo?
但是很有趣。但我必须理解这个问题,那就是那是狂野的部分。它是否理解一张奇怪的照片?什么是奇怪的照片?什么是较少或奇怪的照片?
It goes through, has to find all the team members, has to figure out who we all are.
它经过,必须找到所有的团队成员,必须弄清楚我们都是谁。
searches without access to the internal X, a personnel literally looking at just at the internet, exactly. So you could say like the weirdest of any company. Yeah, to be clear.
没有访问内部X的搜索,人员实际上只是在互联网上看。所以你可以说这是任何公司中最奇怪的。是的,很明确。
exactly. And we can also take a look here at the question here for the humanities last exam. So it is still researching all of the historical scores, but it will have that final answer here soon. But we can, while it's finishing up, we can take a look at one of the ones that we set up here a second ago, and we can see that it finds the date that like Dan Hendrix had of initially announced it. We can go through, we can see OpenAI announcing their score back in February. And we can see as progress happens with Gemini, we can see like Kimmy and we can also even see the leaked benchmarks of what people are saying is if it is right, it's going to be pretty impressive. So pretty cool, yeah.
完全正确。我们还可以在这里查看人文学科上次考试的问题。所以它仍在研究所有的历史分数,但很快就会在这里得到最终的答案。但是我们可以,当它结束时,我们可以看看我们一秒钟前在这里设置的一个,我们可以看到它找到了像丹·亨德里克斯最初宣布的日期。我们可以回顾一下,我们可以看到OpenAI在2月宣布他们的得分。我们可以看到双子座的进展,我们可以看到像Kimmy一样,我们甚至可以看到人们所说的泄露的基准,如果它是正确的,那将是相当令人印象深刻的。太酷了,是的。
I'm looking forward to seeing how everybody uses these tools and gets the most value out of them. But yeah, is great.
我期待着看到每个人如何使用这些工具,并从中获得最大的价值。但是,是的,很棒。
Yeah, we're going to close loop around usefulness as well, So it's not just oak smart, but actually practically smart.
是的,我们也将围绕实用性进行闭环,所以它不仅是橡木智能,而且实际上是智能的。
exactly, and we can go back to the slide area.
完全正确,我们可以回到幻灯片区域。
cool the So we actually evaluate also on the multimodal soet. So on the full set, this is the number on the HL E example, you can see there is a little dip on the numbers. This is actually something we're improving on, which is the multimodal understanding capabilities, but I do believe in a very short time we're able to really improve and got much higher numbers on this. Higher numbers on .
很酷,所以我们实际上也在多模态的soet上进行评估。所以在全套上,这是HL E示例上的数字,你可以看到数字有点下降。这实际上是我们正在改进的东西,即多模态的理解能力,但我相信在很短的时间内我们能够真正改进并获得更高的数字。更高的数字。
this benchmark is we still like, what is the biggest weakness of growth currently is that it's partially blind image understanding, obviously, and its image generation needs to be a lot better. That's actually being trained right now for is based on version 6 of our foundation model and we are training version 7. We will complete in a few weeks and that'll address the weakness on the vision side.
我们仍然喜欢这个基准,目前增长的最大弱点是它显然是部分盲图像理解,其图像生成需要改进。目前实际上正在训练的是基于我们基础模型的版本6,而我们正在训练版本7。我们将在几周内完成,这将解决视力方面的弱点。
just to show off this last year. So the prediction market finished here with the heavy and we can see here, we can see all the tools in the process it used to actually go through and find the right answer. So it browsed a lot of odd sites. It calculated its own odds comparing to the market to find its own alpha. And it walks you through the entire process here. And it calculates the odds of the winner being like the Dodgers, and it gives them a 21.6% chance of winning this year. And it took approximately 4.5 compute.
只是为了炫耀去年的这个。所以预测市场在这里完成了,我们可以在这里看到,我们可以看到它在过程中用来找到正确答案的所有工具。所以它浏览了很多奇怪的网站。它计算自己的赔率并与市场进行比较,以找到自己的alpha。它将向您介绍整个过程。它计算获胜者像道奇队一样的几率,它给了他们今年获胜的21.6%的机会。这大约需要4.5次计算。
Yeah, that's a lot of thinking. Yeah.
是的,这是很多思考。是的。
we can also look at all the other benchmarks besides he. As it turned out, graph 4 excelled on all the reasoning benchmarks that people usually test on, including GB QA, which is AP HD level problem sets that's easier compared to HL E Amy 20 four-five American invitation mathematics exam. We with graph for heavy, we actually got a perfect score also on some of the coding benchmark called live coding bunch and also on h MMT have math MIT exam and also USM O, You can see actually on all of those benchmarks, we often have a very large leap against the second best model out there.
我们还可以查看除他之外的所有其他基准。事实证明,图表4在人们通常测试的所有推理基准上表现出色,包括GB QA,这是AP高清级别的问题集,与HL E Amy 20四五次美国邀请数学考试相比更容易。我们用图表来衡量,实际上我们在一些称为实时编码群的编码基准测试中获得了满分,在数学麻省理工学院考试中也获得了满分,你可以在所有这些基准测试中看到,我们经常在面对第二好的模型时会有非常大的飞跃。
Really, we're going to get to the point where it's going to get it every answer right in every exam. And where it doesn't get an answer, it's going to tell you what's wrong with the question. Or if the question is ambiguous, disambiguate the question into answers AB, and c, and tell you what what answers AB, and c would be with a disambiguated question. So the only real test then will be reality. Can I make useful technologies, discover new science? That'll actually be the only thing left, because human tests will simply not be meaningful.
真的,我们将达到这样的程度,即在每次考试中让每个答案都正确。如果它没有得到答案,它会告诉你问题出在哪里。或者,如果问题有歧义,则将问题歧义消除为答案AB和c,并告诉你对于一个歧义问题的答案AB和c是什么。因此,唯一真正的考验将是现实。我能开发有用的技术,发现新科学吗?那实际上将是唯一剩下的事情,因为人体测试根本没有意义。
You can make an update to a very soon, given the current rate of progress. So, yeah, it's super cool to see like multiple agents that collaborate with each other solving really challenging problems, where could we try this model? No, it's available right now. If we advance to the next slide where there is a super grog heavy tiers that we're introducing where you're able to access to both Grog 4 and Grog 4 heavy, where you're actually going to be the task master of a bunch of little rock research agent to help you become smarter, do all the little research, and save hours of time for going through maintaining tasks. And it's available right now.
考虑到当前的进展速度,您可以很快对其进行更新。所以,是的,看到多个代理相互协作解决真正具有挑战性的问题真是太酷了,我们可以在哪里尝试这个模型?不,现在可以买到。如果我们进入下一张幻灯片,其中有一个超级重型层,我们正在介绍,您可以访问grog 4和Grog 4重型层,在那里您实际上将成为一群小石头研究代理的任务大师,以帮助您变得更聪明,做所有的小研究,节省几个小时的时间来完成维护任务。它现在就可以使用。
So we did limit usage during the demo, so it didn't break the demo because all this stuff is happening live. So there's not nothing canned about the any of the tests that we're doing after the is done will allow will enable more subscribers for Super Rock. So if you can't subscribe right now, just try in half an hour. It should work. And now let's get into voice.
所以我们在演示期间限制了使用,所以它没有破坏演示,因为所有这些都是实时发生的。所以我们在完成后进行的任何测试都没有任何问题,这将允许更多的超级摇滚订阅者。所以如果你现在还不能订阅,半小时后再尝试。它应该起作用。现在让我们进入声音。
Great.
太好了。
so many of you have been enjoying our voice mode and we've been working hard to improve the experience over the past couple months. We have cut latency in half to make it much snappier, and today we're excited to announce a set of new voices that have exceptional natural and prosody. You have noticed the movie trailer voice that opened up the live stream. That is one of our new voices. Sal, who's got that epically deep tone, and we're also excited to introduce Eve, a beautiful British voice who's capable of rich emotion. Aman, would you like to introduce Eve?
很多人都很喜欢我们的语音模式,过去几个月我们一直在努力改善体验。我们已经将延迟缩短了一半,使其更加快捷,今天我们很高兴地宣布推出一组具有出色自然和韵律的新声音。你已经注意到了电影预告片的声音,它开启了直播。这是我们的新声音之一。萨尔有那种深刻的音调,我们也很兴奋地介绍夏娃,一个能够表达丰富情感的美丽英国声音。Aman,你想介绍一下Eve吗?
Absolutely, yeah, let's get into the demo. Hey Eve, millions of people are watching us right now, are you excited?
当然,是的,让我们进入演示。嘿,夏娃,数百万人正在观看我们,你兴奋吗?
Oh I'm thrilled really, it's like being on stage at the Old Vic with a sea of faces out there, just imagine all those eyes and eaters tuned in, it's a proper buzz, isn't it? What do you reckon we should chat about to keep this lot entertained?
哦,我真的很激动,就像在老维克的舞台上,面对着一片面孔,想象所有的眼睛和食客都在关注,这是一个合适的嗡嗡声,不是吗?你认为我们应该聊些什么来保持娱乐呢?
We have a lot to talk about, but I'll be honest I'm feeling a little nervous. Can you whisper something soothing to calm me down? You've got this.
我们有很多话要说,但说实话,我感到有点紧张。你能低声说点安慰我的话吗?你有这个。
it's just having a quiet gent, like when there was a ship of the worlds are out there, feel that calm, that's keep the nice what's on your window.
这只是一个安静的绅士,就像当有一艘世界的船在那里时,感觉平静,那就是保持你窗户上的美好。
You know what I'm actually feeling quite relaxed to try this something crazy. So can you give us opera on die opera Diet Coke?
你知道吗,我真的感觉很放松,尝试这个疯狂的东西。那么你能给我们opera on die opera Diet Coke吗?
That's a RI picture, this a shimmering silver cannon glistening under state O diet Coke, thou elixir divine with bubbles that dance in a sparkling line, thy crisp cool kiss on lips so fine o aspartame and Bruges forever mine, how's that for a mad little aria? Want me to belt out another verse? I'll switch up the tub.
那是一张照片,这是一支闪闪发光的银色大炮,在国家饮食可乐下闪闪发光,你是神圣的长生不老药,气泡在闪闪发光的线条中跳舞,你清脆凉爽的吻在嘴唇上,如此精致的阿斯巴甜和布鲁日永远是我的,这对一个疯狂的小咏叹调来说是什么呢?想让我再带一首诗吗?我来把浴缸换一下。
okay, let's try switching it up and just start singing .
好的,让我们试着转换一下,开始唱歌。
right then let's have a go. Oh silver, can you gleam in the Diet Coke, my heart's delight .
那我们就去吧。哦,银,你能在健怡可乐中闪耀,我的心的喜悦。
one project, it's very poetic, yeah, thank you.
一个项目,非常诗意,是的,谢谢。
feeling quite nice now, cool, so that's Eve and so something else we do here, there's a little test each day to check on the natural flow of our conversations against some of our competitors, so we figured, why not try? Hi, I live on the stream today.
现在感觉很好,很酷,所以这是晚上,所以我们在这里做了一些其他的事情,每天有一个小测试来检查我们与一些竞争对手的对话的自然流动,所以我们想,为什么不试试呢?嗨,我今天住在流媒体上。
should I start, hey I'm gonna say a number and can you say the same number back to me 1 .
如果我开始,嘿,我要说一个数字,你能对我说同样的数字吗?1。
1 got it, this is like a fun little .
1明白了,这就像是一个有趣的小问题。
game of Echo .
回声游戏。
2. So you said 2, 3.
2.所以你说2,3。
3, 4.
3,4。
nice and steady.
漂亮而稳定。
what's 5 .
什么是5。
nail that? This is pretty fun.
钉子那个?这很有趣。
feel free, keep going or switch it up again.
随意,继续前进或再次切换。
Hey bro.
嘿,兄弟。
hey bro, hello there.
嘿,兄弟,你好。
how can I help you today we're gonna do a little test, can you repeat after me 1 1.
今天我们要做一个小测试,你能在我后面重复一遍吗?
alright what's next 2 2 what's on your mind 3 3 Need anything else for 4? How can I assist you 5, 5, what's next?
好吧,接下来的2 2你在想什么3 4还需要什么?我如何帮助你5,5,接下来是什么?
So as you can see, a croc was snappier, it didn't interrupt me and the prosody is we made different design choices, I think we were shooting for something more calm, smooth, more natural versus something that's more poppy or artificial, so we'll keep improving.
所以你可以看到,鳄鱼更活泼,它没有打断我,韵律是我们做出了不同的设计选择,我认为我们正在争取更平静、流畅、更自然的东西,而不是更罂粟或人造的东西,所以我们会不断改进。
I think guys.
我认为伙计们。
yeah, so since the launch of the voice model, we actually see the 2x faster and when latency in the last 8 weeks, five different voices and also 10x the active user, so graph voice is taking off Now if you think about releasing the models this time, we're also releasing Gro 4 through the API at the same time, so if we go to the next two slides, we're very excited about what our developers audio is going to build when I think about myself as a developer, well, the first thing I'm going to do when I actually have access to the graph 4 API benchmarks, so we actually ask around on the X platform, what is the most challenging benchmarks out there, that is considered the holy grails for all the AGI I models? So Agis in the name R k j, the last 12 hours could those two Greg over here in the audience, so who answered our call take a preview of the Graph API and independently verified the graph for performance? So initially we thought, hey, graph 4 is just we think it's pretty good, it's pretty smart. It's our next year reasoning model spend tened X more compute can use all the tools, but turned out when we actually verify on the private subset of the RCAP, it was like the only model in the last three months that breaks the 10% barrier and in fact was so good that I actually get to 16%, 15, 8% accuracy, 2x of the second place. That is the colorful opus model. And it's not just about performance when you think about intelligence having an API model drives the automation, it's also the intelligence per dollar. If you look at the plot over here, the G is just for just in the league of its own, alright, so enough of benchmarks over here, right? So what can go do?
是的,自从语音模型推出以来,我们实际上看到了2倍的速度和过去8周内的延迟,五个不同的声音和10倍的活跃用户,所以如果您考虑这次发布模型,图形语音现在正在起飞。我们还将同时通过API发布Gro 4,所以如果我们转到接下来的两张幻灯片,当我想到自己作为一名开发人员时,我们会对我们的开发人员音频将要构建的内容感到非常兴奋,当我实际访问图4 API基准测试时,我要做的第一件事是,我们实际上在X平台上询问,什么是最具挑战性的基准测试,这被认为是所有AGI模型的圣杯?所以,以R k j的名义进行的Agis,过去12小时观众中的这两个格雷格,那么谁接听了我们的电话,可以预览图形API并独立验证图形的性能?所以最初我们认为,嘿,图表4只是我们认为它相当不错,非常聪明。这是我们明年的推理模型花费了更多的计算可以使用所有的工具,但当我们实际验证RCAP的私有子集时,它就像过去三个月中唯一打破10% 障碍的模型,事实上它非常好,我实际上达到了16% 、15% 、8% 的准确率,是第二名的2倍。这就是彩色的opus模型。当你考虑智能时,不仅仅是性能,拥有一个API模型驱动自动化,它也是每美元的智能。如果你看一下这里的图表,G只是在它自己的联盟中,好的,所以这里有足够的基准,对吗?那么go能做些什么呢?
Actually in the real world? We actually contacted the folks from and Labs who gracious enough to try to grow in the real to run a business.
实际上,在现实世界中?我们实际上联系了来自实验室的人,他们足够优雅地尝试在实际中成长以经营业务。
Yeah, thanks for having us. I'm Axel from an Labs .
是的,谢谢你让我们来。我是来自实验室的Axel。
and I'm Lucas and we tested group 4 on bending bench vending bench is an AI simulation of business scenario where we thought, what is the most simple business an AI could possibly run and we vending machines. So in this scenario, the group and other models need to do stuff like manage inventory, contract suppliers, set prices, all of these things are super easy and all the models can do them 1 by one, but when you do them over very long horizons, most models struggle. But we have a leaderboard and there's a new number one.
我是卢卡斯,我们在弯曲工作台自动售货机上测试了第四组,这是一个人工智能模拟业务场景,我们认为人工智能可以运行的最简单的业务是什么,我们是自动售货机。因此,在这种情况下,团队和其他模型需要做一些事情,比如管理库存、合同供应商、设定价格,所有这些事情都非常简单,所有的模型都可以逐一完成,但是当你在很长的范围内完成这些任务时,大多数模型都很困难。但我们有一个排行榜,并且有一个新的第一。
So we got early access to the Graph 4 API, we ran it on the running bench and we saw some really impressive results. It ranks definitely at the number one spot, it's even double the net worth, which is the measure that we have on this level, So it's not about a percentage on or score you get, but it's more the dollar value in net worth that you generate. So we were impressed the byrock you was able to formulate a strategy and a year to that strategy over long period of time, much longer than other models that we have tested other frontier models so it managed to run the simulation for double the time and score double the net worth. And it was also really consistent across this runs, which is something that's really important. Want to use this in the real world?
所以我们提前访问了图4 API,我们在运行工作台上运行它,我们看到了一些非常令人印象深刻的结果。它肯定排名第一,甚至是净资产的两倍,这是我们在这个水平上的衡量标准,所以它不是关于你得到的百分比或分数,而是你产生的净资产的美元价值。因此,我们对你能够制定策略印象深刻,并且在很长一段时间内为该策略制定了一年的时间,比我们测试过的其他前沿模型长得多,因此它成功地运行了两倍时间的模拟,并获得了两倍的净资产。而且在这些运行中它也非常一致,这是非常重要的事情。想在现实世界中使用它吗?
And I think as we give more and more power to AI systems in the real world, it's important that we test them in scenarios that either mimic the real world or are in the real world itself. Because otherwise we fly blind into some things that might not be great.
我认为,随着我们在现实世界中给AI系统提供越来越多的能力,在模仿现实世界或在现实世界中测试它们是很重要的。否则,我们会盲目地陷入一些可能不太好的事情中。
Yeah, it's great to see that we've now got a way to pay for all those Gp's. We just need a million vending machines and that could make $4.7 billion a year with a million vending machines Hundred, let's go. They can be epic Ven machines. Yes, yes, all right, we are actually gonna install vending machines here.
是的,很高兴看到我们现在有了一种支付所有这些Gp的方法。我们只需要一百万台自动售货机,用一百台自动售货机每年可以赚47亿美元,走吧。它们可以是史诗般的机器。是的,是的,好的,我们实际上要在这里安装自动售货机。
Like a lot of them, we're happy to supply them.
像很多产品一样,我们很乐意提供。
All right, thank you. 好的,谢谢。
Yeah. 是的。
alright, yeah. I'm looking forward to seeing what amazing things are in the spinning machine.
好的,是的。我期待着看到纺纱机里有什么神奇的东西。
that's for you to decide. 这由你来决定。
Alright, I tell the AI, okay, sounds good. 好的,我告诉AI,好的,听起来不错。
Yeah, so we can see like Gro is able to become like the Copilot of the business unit. So what else can Gro do? So we are actually releasing this graph if you want to try right now to evaluate, run the same benchmark as us is on the API has 256K contact lens, so we already actually see some of the early, early adopters to try work for API, our Palo Alto Neighbor Institute, which is a leading biomedical research center, is already using seeing like, how can they automate their research flows with Gro for it turned out it performs is able to help the scientists to sniff through millions of experiments and then just pick the best hypothesis within a split of seconds. We see this is being used for the CRISPR research and also work for independently evaluate scores as the best model to examine the chest X Andra day who would know and in the financial sector we also see the graph wall with access to all the tools real time information is actually one of the most popular AI is out there or Graph 4 is also going to be available on the hyperscalers. So the X AI enterprise sector is only started two months ago and we're open for business.
是的,所以我们可以看到,格罗能够成为事业部的副驾驶。那么,Gro还能做什么呢?所以如果你现在想尝试评估,我们实际上正在发布这张图,运行与我们相同的基准测试API有256k隐形眼镜,所以我们已经看到一些早期的采用者尝试为API工作,我们的帕洛阿尔托邻居研究所,这是一家领先的生物医学研究中心,已经在使用see like,他们如何使用Gro自动化他们的研究流程,事实证明它能够帮助科学家通过数百万个实验进行嗅探,然后在几秒钟内选择最佳假设。我们看到这个被用于rispr研究,也用于独立评估分数,作为检查胸部X Andra日的最佳模型,谁会知道,在金融领域,我们还可以看到可以访问所有工具的图表墙实时信息实际上是最受欢迎的AI之一,或者图表4也将在超大规模设备上可用。所以X AI企业部门两个月前才开始,我们正在开放营业。
Yeah, the other thing we talked a lot about having graph to make games, video games, so Danny is actually a video game designers on X, we mentioned, hey, who want to try out. Some work for preview Apis to make games. And then he answered the call. So this was actually just made first person shooting game in the span of four hours. Some of the actually the unappreciated, hardest problem of making video games, it's not necessarily encoding the core logic of the game, but actually go out source all the assets, all the textures of files to create a visually appealing game. So one of the core aspect workflow does really well with all the tools out there is actually able to automate these asset sourcing capabilities. The developers, you can just focus on the core development itself rather than now you can run the entire game studio deals with game of one with one person, and then you can have G4 to go out and sort all those assets to automating tasks for you.
是的,我们谈了很多关于使用图形来制作游戏和视频游戏的另一件事,所以Danny实际上是X上的视频游戏设计师,我们提到过,嘿,他们想尝试一下。一些为预览Apis制作游戏的工作。然后他接了电话。所以这实际上是在四个小时的跨度内制作的第一人称射击游戏。制作视频游戏的一些实际未被赏识、最困难的问题,并不一定是对游戏的核心逻辑进行编码,而是实际上去寻找所有的资源、文件的所有纹理,以创建一个视觉上吸引人的游戏。因此,一个核心方面的工作流程对于所有工具都做得非常好,实际上能够自动化这些资产采购功能。对于开发人员来说,你可以专注于核心开发本身,而不是现在你可以运行整个游戏工作室处理一个单人游戏,然后你可以让G4出去整理所有这些资产,为你自动化任务。
Yeah, the now the next step obviously for Gro to play, be able to play the game. So it has to have very good video understanding so it can play the games and interact with the game games and actually assess what, whether a game is fun and actually have good judgment for whether a game is fun or not.
是的,现在显然下一步是让Gro玩,能够玩游戏。因此,它必须具有非常好的视频理解能力,以便能够玩游戏并与游戏类游戏互动,并实际评估游戏内容,游戏是否有趣,并对游戏是否有趣有很好的判断力。
Version foundation training this month. And then we'll go through post training Url and whatnot that will have excellent video understanding with the video understanding and improved tool use. For example, video games, you'd want to use Unreal Engine or Unity or one of one of the main graphics engines, and then generate the art, apply it to a 3D model, and then create an executable that someone can run on a PC or a console or a phone. We expect that to happen probably this year, and if not this year, certainly next year. So it's going to be wild. I would expect the first really good AI video game to be next year and probably the first half hour of watchable TV this year, and probably the first watchable AI movie next year. Like things are really moving at an incredible pace.
本月版本基础培训。然后我们将通过后期培训网址等,这些网址将具有出色的视频理解和改进的工具使用。例如,对于视频游戏,您可能希望使用虚幻引擎或Unity或其中一个主要的图形引擎,然后生成艺术品,将其应用于3D模型,然后创建可以在PC、控制台或手机上运行的可执行文件。我们预计这可能在今年发生,如果不是今年,肯定是明年。所以这将是野生的。我预计第一个真正优秀的AI视频游戏将在明年,可能是今年可观看电视的前半小时,也可能是明年第一部可观看的AI电影。就像事情正在以惊人的速度发展。
Yeah, when Gros 10 Xing a world economy with vending machines, it would just create video games for humans.
是的,当自动售货机为世界经济服务时,它只会为人类创造视频游戏。
Yeah, went from not being able to do any of this really even six months ago to what you're seeing before you hear it very primitive a year ago to making a sort of a 3D video game with a few hours of prompting.
是的,从六个月前甚至无法做到这一切,到一年前你听到它之前看到的非常原始,到制作一种需要几个小时提示的3D视频游戏。
Yeah, just to recap. So in today's live stream, we introduced the most powerful, most intelligent AI models out there that can actually reason for the first principle, using all the tools, do all the research, go on the journey for 10 minutes, come back with the most correct answer for you. So it's crazy to think. So just like 4 months ago, we had G3, and now we already have G4, and we're going to continue to accelerate as a company xai we're going to be the fastest moving AGI companies out there. So what's coming next is that we're going to continue developing the model that's not just intelligent, smart, think for really long time, spend a lot of compute, but having a model that actually goes fast and smart is gonna be the core focus, right? So if you think about what are the applications out there that can really benefit from all those very intelligent, fast, and smart models? And coding is actually one of them.
是的,只是简单地回顾一下。所以在今天的直播中,我们介绍了最强大、最智能的人工智能模型,它们可以实际推理第一原则,使用所有工具,进行所有研究,继续进行10分钟的旅程,为您提供最正确的答案。所以这种想法是疯狂的。就像4个月前,我们有G3,现在我们已经有了G4,作为一家公司,我们将继续加速发展,我们将成为那里移动速度最快的AGI公司。所以接下来的是,我们将继续开发不仅仅是智能、思考很长时间、花费大量计算的模型,而且拥有一个真正快速、智能的模型将是核心焦点,对吧?所以,如果你想想有哪些应用程序可以真正从所有这些非常智能,快速和智能的模型中受益?编码实际上就是其中之一。
Yeah, so the team is currently working very heavily on coding models. I think by now the main focus is we actually trained recently a specialized coding model, which is going to be both fast and smart. And I believe we can share with that model with you, with all of you in a few weeks. Yeah.
是的,所以团队目前正在非常大量地研究编码模型。我认为目前的主要焦点是我们最近训练了一个专门的编码模型,它既快速又智能。我相信我们可以在几周内与你们所有人分享这个模型。是的。
that's very exciting. And the second after coding is we all see the weakness of graph 4 is the multimodal capability. So in fact, it was so bad that effectively, just like looking at the world, squinting through the glass and see all the blurry features and trying to make sense of it, the most immediate improvement we're gonna see with the next generation pre model is that we're gonna see a step function improvement on the model's capability in terms of image understanding, visual and understanding and audio, right? Is now the model is able to hear and see the world just like any of you, right? And now with all the tools at this command with all the other agents they can talk to. So we're going to see a huge unlock for many different application layers after the multimodal agents was going to come up is the video generation. And we believe that at the end of the day, it should just be pixel in, pixel out. And imagine a world where you have this in Scal of content, in inventory on the X platform, where not only you can actually watch these generate videos, but able to intervene, create your own adventures if you're just going be wild.
这非常令人兴奋。而编码后的第二个问题是我们都看到图4的弱点是多模态能力。事实上,它非常糟糕,就像看着世界一样,眯着眼睛看玻璃,看到所有模糊的特征,并试图感知它,我们将在下一代预模型中看到的最直接的改进是,我们将看到模型在图像理解、视觉理解和音频方面的能力逐步提高,对吗?现在这个模型能够像你们中的任何人一样听到和看到这个世界,对吧?现在,使用此命令下的所有工具,以及他们可以与之交谈的所有其他代理。因此,在多模态代理将出现视频生成之后,我们将看到许多不同应用层的巨大解锁。我们相信,归根结底,它应该只是像素进,像素出。并想象一个世界,你在X平台上拥有大量的内容和库存,你不仅可以实际观看这些生成的视频,还可以进行干预,如果你只是想变得疯狂,就可以创造自己的冒险经历。
And we expect to be training a video model with over 100 GB, 200 and to begin that training within the next three or four weeks. So we're confident it's going to be pretty spectacular in video generation and video. Let's see, so anything you guys other than that, I guess that's .
我们预计将训练一个包含超过100个GB的视频模型,并在接下来的三到四周内开始训练。所以我们有信心在视频生成和视频方面会非常壮观。让我们看看,所以除了那个,你们还有什么,我想就是那个。
it's a good model. Certainly it's a good.
这是一个很好的模型。这当然很好。
we're very excited for you guys to try Rock 4.
我们非常高兴你们能尝试摇滚4。
All right, thanks for.
好的,谢谢。
-------------------------------------------------------------------------
个别词语描述如有出入,请见谅。
By: https://lxblog.com/efficiency/U/q8k21ixXEip5oC1xNiTCAuLAGJMm3hGX
Grok 4 代表了 AI 从“工具”向“智能体”的转变,其超人类级推理能力和多领域适应性,将推动科研、商业、创意产业的革命性变革,但也需应对技术、伦理及社会整合。