S1: The $6 R1 Competitor?

163 points by tkellogg 3 hours ago

> having 10,000 H100s just means that you can do 625 times more experiments than s1 did

I think the ball is very much in their court to demonstrate they actually are using their massive compute in such a productive fashion. My BigTech experience would tend to suggest that frugality went out the window the day the valuation took off, and they are in fact just burning compute for little gain, because why not...

gessha 7 minutes ago

This is pure speculation on my part but I think at some point a company's valuation became tied to how big their compute is so everybody jumped on the bandwagon.
whizzter 36 minutes ago

Mainly it points to a non-scientific "bigger is better" mentality, and the researchers probably didn't mind playing around with the power because "scale" is "cool".
Remember that the Lisp AI-labs people were working on non-solved problems on absolute potatoes of computers back in the day, we have a semblance of progress solution but so much of it has been brute-force (even if there has been improvements in the field).
The big question is if these insane spendings has pulled the rug on real progress if we head into another AI winter of disillusionment or if there is enough real progress just around the corner to show that there is hope for investors in a post-deepseek valuation hangover.
- wongarsu 23 minutes ago
  
  We are in a phase where costs are really coming down. We had this phase from GPT2 to about GPT4 where the key to building better models was just building bigger models and training them for longer. But since then a lot of work has gone into distillation and other techniques to make smaller models more capable.
  If there is another AI winter, it will be more like the dotcom bubble: lots of important work got done in the dotcom bubble, but many of the big tech companies started from the fruits of that labor in the decade after the bubble burst
svantana 16 minutes ago

Besides that, AI training (aka gradient descent) is not really an "embarrassingly parallel" problem. At some point, there are diminishing returns on adding more GPUs, even though a lot of effort is going into making it as parallel as possible.

mark_l_watson 29 minutes ago

Off topic, but I just bookmarked Tim’s blog, great stuff.

I dismissed the X references to S1 without reading them, big mistake. I have been working generally in AI for 40 hears and neural networks for 35 years and the exponential progress since the hacks that make deep learning possible has been breathtaking.

Reduction in processing and memory requirements for running models is incredible. I have been personally struggling with creating my own LLM-based agents with weaker on-device models (my same experiments usually work with 4o-mini and above models) but either my skills will get better or I can wait for better on device models.

I was experimenting with the iOS/iPadOS/macOS app On-Device AI last night and the person who wrote this app was successful in combining web search tool calling working with a very small model - something that I have been trying to perfect.

pona-a 14 minutes ago

If chain of thought acts as a scratch buffer by providing the model more temporary "layers" to process the text, I wonder if making this buffer a separate context with its own separate FNN and attention would make sense; in essence, there's a macroprocess of "reasoning" that can unbounded time to complete, and then there's another microprocess of describing this incomprehensible stream of embedding vectors as natural language explanation, in a way returning to encoder/decoder architecture but where both are autoregressive. Maybe it will give us a denser representation of said "thought" not constrained by accurately reproducing the output.

cowsaymoo 25 minutes ago

The part about taking control of a reasoning model's output length using <think></think> tags is interesting.

> In s1, when the LLM tries to stop thinking with "</think>", they force it to keep going by replacing it with "Wait".

I had found a few days ago that this let you 'inject' your own CoT and jailbreak it easier. Maybe these are related?

https://pastebin.com/G8Zzn0Lw

https://news.ycombinator.com/item?id=42891042#42896498

causal 18 minutes ago

This even points to a reason why OpenAI hides the "thinking" step: it would be too obvious that the context is being manipulated to induce more thinking.

sambull 10 minutes ago

That sovereign wealth fund with tik tok might set a good precedent; when we have to 'pour money' into these companies we can do so with stake in them held in our sovereign wealth fund.

bberenberg 2 hours ago

In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1

mi_lk 2 hours ago

it's also the first link in the article's first sentence
- bberenberg 2 hours ago
  
  Good call, I must have missed it. I read the whole blog then went searching for what S1 was.
addandsubtract 2 hours ago

It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.
- kgwgk an hour ago
  
  > for some reason the author never bothered to attach the name to it
  Respect for his readers’ intelligence, maybe.

theturtletalks 18 minutes ago

Deepseek R1 uses <think/> and wait and you can see it in the thinking tokens second guessing itself. How does the model know when to wait?

These reasoning models are feeding more to OP's last point about NVidia and OpenAI data centers not being wasted since reason models require more tokens and faster tps.

qwertox 15 minutes ago

Probably when it would expect a human to second guess himself, as shown in literature and maybe other sources.

ttyprintk 2 hours ago

https://huggingface.co/simplescaling

anentropic an hour ago

and: https://github.com/simplescaling/s1
- mettamage 33 minutes ago
  
  When you're only used to ollama, how do I go about using this model?

cyp0633 18 minutes ago

Qwen's QvQ-72B does much more "wait"s than other LLMs with CoT I tried, maybe they've somewhat used that trick already?

Havoc 25 minutes ago

The point about agents to conceal access to the model is a good one.

Hopefully we won’t lose all access to models in future

yapyap an hour ago

> If you believe that AI development is a prime national security advantage, then you absolutely should want even more money poured into AI development, to make it go even faster.

This, this is the problem for me with people deep in AI. They think it’s the end all be all for everything. They have the vision of the ‘AI’ they’ve seen in movies in mind, see the current ‘AI’ being used and to them it’s basically almost the same, their brain is mental bridging the concepts and saying it’s only a matter of time.

To me, that’s stupid. I observe the more populist and socially appealing CEOs of these VC startups (Sam Altman being the biggest, of course.) just straight up lying to the masses, for financial gain, of course.

Real AI, artificial intelligence, is a fever dream. This is machine learning except the machines are bigger than ever before. There is no intellect.

and the enthusiasm of these people that are into it feeds into those who aren’t aware of it in the slightest, they see you can chat with a ‘robot’, they hear all this hype from their peers and they buy into it. We are social creatures after all.

I think using any of this in a national security setting is stupid, wasteful and very, very insecure.

Hell, if you really care about being ahead, pour 500 billion dollars into quantum computing so u can try to break current encryption. That’ll get you so much further than this nonsensical bs.

menaerus 17 minutes ago

You can choose to be somewhat ignorant of the current state in AI, about which I could also agree that at certain moments it appears totally overhyped, but the reality is that there hasn't been a bigger technology breakthrough probably in the last ~30 years.
This is not "just" machine learning because we have never been able to do things which we are today and this is not only the result of better hardware. Better hardware is actually a byproduct. Why build a PFLOPS GPU when there is nothing that can utilize it?
If you spare yourself some time and read through the actual (scientific) papers of multiple generations of LLM models, the first one being from DeepMind in 2017, you might get to understand that this is no fluff.
And I'm speaking this from a position of a software engineer, without bias.
The reason why all this really took off with so much hi-speed is because of the not quite expected results - early LLM experiments have shown that "knowledge" with current transformers architecture can linearly scale with regards to the amount of compute and training time etc. That was very unexpected and to this day scientists do not have an answer why this even works.
So, after reading bunch of material I am inclined to think that this is something different. The future of loading the codebase into the model and asking the model to fix bugs has never been so close and realistic. For the better or worse.
dotancohen 16 minutes ago
```
  > Real AI, artificial intelligence, is a fever dream. This is machine learning except the machines are bigger than ever before. There is no intellect.
```
That sounds to me like dismissing the idea that a Russian SSBN might cross the Pacific and nuke Los Angeles because "submarines can't swim".
Even if the machine learning isn't really intelligent, it is still capable of performing IF..THEN..ELSE operations, which could have detrimental effects for [some subset of] humans.
And even if you argue that such a machine _shouldn't_ be used for whatever doomsday scenario would harm us, rest assured that someone, somewhere, who either does not understand what the machines are designed to do or just pretends that they work like magic, will put the machines in a position to make such a decision.
amarcheschi an hour ago

I couldn't agree more.
If we're not talking about cyber war exclusively, such as finding and exploiting vulnerabilities, for the time being national security will still be based on traditional army.
Just a few weeks ago, italy announced a 16bln€ plan to buy >1000 rheinmetall ifv vehicles. That alone would make italy's army one of the most equipped in Europe. I can't imagine what would happen with a 500$bln investment in defense,lol. I don't agree with what Meloni's government is doing, but one of the ministers I agree more with is the defense minister Crosetto
Furthermore, what is being shown, at least for the time being, is that open source can be and is crucial in aiding developing better models. This collides with the idea of big, single "one winner takes it all" VC mentality (because let's be honest, these defense pitches are still made by startup/VC bros)
- piltdownman a minute ago
  
  >italy announced a 16bln€ plan to buy >1000 rheinmetall ifv vehicles. That alone would make italy's army one of the most equipped in Europe.
  So target practice for a beyond-the-horizon missile system launched ground-to-ground or air-to-ground? As an attacking force, conventional ground forces and tactics are a non-runner in a modern theatre of operations when faced against air and drone support. This is why no single EU country is incentivised into dumping money into any single area - as the only probable defense would be against USA/Russia/China to begin with.
  The US proved it beyond doubt in Afghanistan - partisans simply haven't a chance against a gunship with IR or NV optics; the last time they levelled the playing field against air interdictors was in Charlie Wilson's Afghanistan when the Mujahideen took on that era of Soviet gunships with hand-held AA systems.
smcl an hour ago

Been saying this for years, it's been fucking baffling. Generating images, video and text that sort-of resembles what a human would come up with is genuinely quite impressive. It is not "let's claim it'll fix our country" (looking at you, Keir) impressive though, and I cannot believe so much money has been pumped into it.
- amarcheschi an hour ago
  
  But you have to over promise and under deliver, otherwise you won't receive those sweet sweet money
baq 18 minutes ago

I can only say that exponential curves grow nominally sublinearly before they take off. AI is not quite at the obvious take off point, but owners of the biggest clusters have seen the extrapolations and it isn't pretty - once your competitor achieves take off and you aren't anywhere close, you're done for. The risk of not participating in that are too great.
amelius 41 minutes ago

Yes, I'd like to see some examples where our current AI can actually extrapolate rather than interpolate. Let it invent new things, new drawing styles, new story plots, etc. Maybe _then_ it will impress me.
- mrshadowgoose 39 minutes ago
  
  Here you go: https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1
  
  amelius 15 minutes ago
  
  I'm not convinced. This is using the tooling and paradigms invented by humans.
- moffkalast 14 minutes ago
  
  Can you?
mrshadowgoose 44 minutes ago

> They think it’s the end all be all for everything.
Is (human-based) general intelligence not one of the fundamental enabling elements of literally every human activity throughout history, regardless of how many layers of automation and technology one has to peel back to get to it?
Can you maybe imagine how the ability to create arbitrary amounts of general intelligence, completely divorced from the normal lengthy biological process, could upend that foundation of human activity?
> They have the vision of the ‘AI’ they’ve seen in movies in mind, see the current ‘AI’ being used and to them it’s basically almost the same, their brain is mental bridging the concepts and saying it’s only a matter of time.
I've found that most AI-related movies exclusively focus on "quality ASI" scenarios, which are mostly irrelevant to our current state of the world, as an immense amount of danger/value/disruption will arrive with AGI. People who are seriously reasoning about the impacts of AGI are not using movies as references. "Those stupid movie watching idiots" is just a crutch you are using to avoid thinking about something that you disagree with.
> Real AI, artificial intelligence, is a fever dream. This is machine learning except the machines are bigger than ever before. There is no intellect.
Do you have any evidence to support this conclusion? And does it even matter? If "fake intellect" can replace a human, that human still has to deal with the very real issue or not having a job anymore. If "fake intellect" is used to conduct mass surveillance, and direct suppression activities towards divergent individuals, those individuals are still going to have a bad time.
mnky9800n an hour ago

Also the narrative that we are currently on the brink of Ai explosion and this random paper shows it has been the same tired old story handed out by ai hawks for years now. Like yes, I agree with the general idea that more compute means more progress for humans and perhaps having a more responsive user interface through some kind of ai type technology would be good. But I don’t see why that will turn into Data from Star Trek. But I also think all these ai hawks kind of narcissistically over value their own being. Like blink and their lives are over in the grand scheme of things. Maybe our “awareness” of the world around us is an illusion provided by evolution because we needed it to value self preservation whereas other animals don’t. There is an inherent belief in the specialness of humans that I suppose I mistrust.
- ben_w 4 minutes ago
  
  > But I don’t see why that will turn into Data from Star Trek.
  "Is Data genuinely sentient or is he just a machine with this impression" was a repeated plot point in TNG.
  https://en.wikipedia.org/wiki/The_Measure_of_a_Man_(Star_Tre...
  https://en.wikipedia.org/wiki/The_Offspring_(Star_Trek:_The_...
  https://en.wikipedia.org/wiki/The_Ensigns_of_Command
  https://en.wikipedia.org/wiki/The_Schizoid_Man_(Star_Trek:_T...
  Similar with The Doctor on VOY.
  Even then, what we have with LLMs is basically already at the level of the ship's main computer as it was written in TNG/DS9/VOY.
- encipriano an hour ago
  
  I find the last part of the paragraph offputting and I agree
snarf21 31 minutes ago

Agreed. I was working on some haiku things with ChatGPT and it kept telling me that busy has only one syllable. This is a trivially searchable fact.
sidewndr46 36 minutes ago

What is even the possible usage of AI for national security? Generating pictures of kittens riding nuclear weapons to the very end like in Dr Strangelove?
- ben_w 15 minutes ago
  
  > What is even the possible usage of AI for national security? Generating pictures of kittens riding nuclear weapons to the very end like in Dr Strangelove?
  For all that critics of AI dismiss them as lacking imagination, your reaction suggests a lack of imagination.
  Off the top of my head: facial recognition and identification to make "smart" guns that hit specific targets with reduced collateral damage (as found on most digital cameras even before smartphones); creating and A/B testing propaganda campaigns; using modified wifi signals as wall-penetrating radar capable of post estimation, heart rate and breathing monitoring[0]; take any self-driving car's AI and conditionally invert the part that says "don't hit pedestrians" when a certain target is spotted; ANPR to track specific vehicles with known owners over long distances; alternative targeting system for cruise missiles in the absence or jamming of GPS systems; using them as red teams in war-game exercises; using them to automate intrusion detection by monitoring for changes to background distributions of basically every measurable event; person-tracking by watching CCTV in secure areas; control systems for security robots (think Boston Dynamics' Spot) that are currently in deployment.
  There's likely a lot more, too.
  [0] https://openaccess.thecvf.com/content_cvpr_2018/papers/Zhao_...
encipriano 42 minutes ago

You would solve the ai problem if you correctly defined what intellect even is.
robwwilliams 30 minutes ago

It used to be much easier to be conservative about AI, especially AGI, after living through three cycles of AI winters. No more. Dismissing it as “merely machine learning” is worse than unfair to the last decade of machine learning ;-)
The hard part now is relatively trivial. Does anyone think that there is a fundamental and profound discovery that evolution made purely by selection in the last 200,000 years? I mean a true qualitative difference?
Sure—-We call it language, which is just another part of a fancy animal’s tool kit.
Does anyone think there is an amazing qualitative difference between the brain of a chimp and the brain of a human?
No, not if they know any biology.
(Although that does not stop some scientist from looking for a “language gene” like FOXP2.)
So what did dumb mutations and 200,000 years of selection do that a group of dedicated AI scientists cannot do with their own genuine general intelligence?
Nothing—-nothing other than putting a compact energy efficient LLM with reinforcement learning on a good robotic body and letting it explore and learn like we did as infants, toddlers and teenagers.
Each one of us had experienced becoming a “general intelligence”. I remember it hit me on the head in 6th grade when I dreamed up a different way of doing long division. I remember thinking: “How did I think that?” And each one of us who has watched an infant turn into a toddler has watched it as an observer or teacher. This is what makes babies so fascinating to “play” with.
We have to give our baby AGI a private memory and a layer of meta-attention like we all gain as we mature, love, and struggle.
I read the linked article and as a neuroscientist I realized the “wait” cycles that improved performance so much; that is what we call the prefrontal cortex: the part of the CNS most responsible for enabling us to check our own reasoning recursively. Delay—as in delayed gratification—-is a key attribute of intelligent systems.
We are finally on the door step to Hofstadter’s Strange Loop and Maturana’s and Valera’s “enactive” systems, but now implemented in silicon, metal, and plastic by us rather than dumb but very patient natural selection.
Karl Friston and Demis Hassabis (two very smart neuroscientist) figured this out years ago. And they were preceded by three other world class neuroscientist: Humberto Maturana, Francisco Valera, and Rich Sutton (honorary neuroscientist). And big credit to Terry Winograd for presaging this path forward long ago too.
pilingual an hour ago

> This is machine learning
Yeah, I was thinking about this while trying to figure out author affiliations.
There was a Stanford paper a few years ago that dusted off some old intelligence concepts and the authors seemed excited about it.
But given the pace of AI, it's difficult to look in new directions. It will probably take an AI winter and some unbridled enthusiasm immune to burnout to make some real progress outside of feed forward neural networks.
pjc50 an hour ago

> then you absolutely should want even more money poured into AI development, to make it go even faster.
Indeed. People are welcome to go "all in" on whatever nonsense gambling they want to do with their personal investments, but national security demands actually thinking about things - adversarially. Because the enemy will as well.
It's perfectly possible to lose a war by investing in expensive superweapons that under deliver. The Nazis were particularly bad at this.
spacebanana7 an hour ago

> I think using any of this in a national security setting is stupid
What about AI enabled drones and guided missiles/rockets? The case for their effectiveness is relatively simple in terms of jamming resistance.
- pjc50 an hour ago
  
  Like a lot of AI boosters, would you like to explain how that works other than magic AI dust? Some forms of optical guidance are already in use, but there's other limitations (lighting! weather!)
  
  spacebanana7 an hour ago
  
  Sure thing. The basic idea would be:
  1) Have a camera on your drone 2) Run some frames through a locally running version of something like AWS Rekognition's celebrity identification service but for relevant military targets. 3) Navigate towards coordinates of target individuals
  It isn't exactly magic, here's a video of a guy doing navigation with openCV on images: https://www.youtube.com/watch?v=Nrzs3dQ9exw
  
  Hauthorn 31 minutes ago
  
  I believe this is a capability that the Switchblade 600 or STM KARGU already has.
  https://en.wikipedia.org/wiki/STM_Kargu
- amarcheschi an hour ago
  
  I would say that they don't require an 500bln$ investment. AFAIK, drone that help lock on target have started being used in Ukraine
  
  spacebanana7 an hour ago
  
  I generally agree, piggybacking on innovations in smartphone GPUs / batteries will probably be enough to get locally running AI models in drones.
- swiftcoder an hour ago
  
  drone and missile guidance system development has been using ML for decades at this point. That's just as much "AI" as anything currently coming out of the LLM craze.
- GTP an hour ago
  
  This somehow reminds me of a certain killer robot from a Black Mirror episode ;)

HenryBemis 41 minutes ago

> Going forward, it’ll be nearly impossible to prevent distealing (unauthorized distilling). One thousand examples is definitely within the range of what a single person might do in normal usage, no less ten or a hundred people. I doubt that OpenAI has a realistic path to preventing or even detecting distealing outside of simply not releasing models.

(sorry for the long quote)

I will say (naively perhaps) "oh but that is fairly simple". For any API request, add a counter of 5 seconds to the next for 'unverified' users. Make the "blue check" (a-la X/Twitter). For the 'big sales' have a third-party vetting process so that if US Corporation XYZ wants access, they prove themselves worthy/not Chinese competition and then you do give them the 1000/min deal.

For everyone else, add the 5 second (or whatever other duration makes sense) timer/overhead and then see them drop from 1000 requests per minutes to 500 per day. Or just cap them at 500 per day and close that back-door. And if you get 'many cheap accounts' doing hand-overs (AccountA does 1-500, AccountB does 501-1000, AccountC does 1001-1500, and so on) then you mass block them.

GTP an hour ago

Sorry for being lazy, but I just don't have the time right now to read the paper. Is there in the paper or somewhere else a comparison based on benchmarks of S1 vs R1 (the full R1, not quantized or distilled)?

pama an hour ago

The S1 paper is not meant to compete with R1. It simply shows that with 1k well curated examples for finetuning (26 minutes training on 16 GPU) and with a simple hack for controlling the length of the thinking process, one can dramatically increase the performance of a non-reasoning model and show a clear increase in benefit with increased test-time compute. It is worth a quick skim.