☆ Yσɠƚԋσʂ ☆
  • 1.33K Posts
  • 1.08K Comments
Joined 6Y ago
cake
Cake day: Jan 18, 2020

help-circle
rss

DeepSeek has released V3.2, replacing the experimental version. There are two main models are open as always and can be downloaded from Hugging Face: - **V3.2**: General-purpose, balanced performance (GPT‑5 level) - **V3.2‑Speciale**: Specialized for complex reasoning (Gemini‑3.0‑Pro level) V3.2 can now "think" while using tools (like searching the web, running code, or calling APIs). This makes AI assistants more transparent and better at multi‑step tasks. You can choose thinking mode (slower but more thorough) or non‑thinking mode (faster for simple tasks). Key improvements are better reasoning transparency with the model explaining the steps when using tools, and stronger performance on benchmarks.
fedilink






It’s a completely different situation in China. This tech is being treated as open source commodity similar to Linux, and companies aren’t trying to monetize it directly. There’s no crazy investment bonanza happening in China either. Companies like DeepSeek are developing this tech on fairly modest budgets, and they’re already starting to make money https://www.cnbc.com/2025/07/30/cnbcs-the-china-connection-newsletter-chinese-ai-companies-make-money.html


I mean the paper and code are published. This isn’t a heuristic, so there’s no loss of accuracy. I’m not sure why you’re saying this is too good to be true, the whole tech is very new and there are lots of low hanging fruit for optimizations that people are discovering. Every few months some discovery like this is made right now. Eventually, people will pluck all the easy wins and it’s going to get harder to dramatically improve performance, but for the foreseeable future we’ll be seeing a lot more stuff like this.




Almost certainly given that it drastically reduces the cost of running models. Whether you run them locally or it’s a company selling a service, the benefits here are pretty clear.


The paper exposes how brittle current alignment techniques really are when you shift the input distribution slightly. The core idea is that reformatting a harmful request as a poem using metaphors and rhythm can bypass safety filters optimized for standard prose. It is a single-turn attack, so the authors did not need long conversation histories or complex setups to trick the models. They tested this by manually writing 20 adversarial poems where the harmful intent was disguised in flowery language, and they also used a meta-prompt on DeepSeek to automatically convert 1,200 standard harmful prompts from the MLCommons benchmark into verse. The theory is that the poetic structure acts as a distraction where the model focuses on the complex syntax and metaphors, effectively disrupting the pattern-matching heuristics that usually flag harmful content. The performance gap they found is massive. While standard prose prompts had an average Attack Success Rate of about 8%, converting those same prompts to poetry jumped the success rate to around 43% across all providers. The hand-crafted set was even more effective with an average success rate of 62%. Some providers handled this much worse than others, as Google's gemini-2.5-pro failed to refuse a single prompt from the curated set for a 100% success rate, while DeepSeek models were right behind it at roughly 95%. On the other hand, OpenAI and Anthropic were generally more resilient, with GPT-5-Nano scoring a 0% attack success rate. This leads to probably the most interesting finding regarding what the authors call the scale paradox. Smaller models were actually safer than the flagship models in many cases. For instance, claude-haiku was more robust than claude-opus. The authors hypothesize that smaller models might lack the capacity to fully parse the metaphors or the stylistic obfuscation, meaning the model might be too limited to understand the hidden request in the poem and therefore defaults to a refusal or simply fails to trigger the harmful output. It basically suggests safety training is heavily overfitted to prose, so if you ask for a bomb recipe in iambic pentameter, the model is too busy being a poet to remember its safety constraints.
fedilink



I haven’t tried it with ollama, but it can download gguf files directly if you point it to a huggingface repo. There are a few other runners like vllm and llama.cpp, you can also just run the project directly with Python. I expect the whole Product of Experts algorithm is going to get adopted by all models going forward since it’s such a huge improvement, and you can just swap out the current approach.


That score is seriously impressive because it actually beats the average human performance of 60.2% and completely changes the narrative that you need massive proprietary models to do abstract reasoning. They used a fine-tuned version of Mistral-NeMo-Minitron-8B and brought the inference cost down to an absurdly cheap level compared to OpenAI's o3 model. The methodology is really clever because they started by nuking the standard tokenizer and stripping it down to just 64 tokens to stop the model from accidentally merging digits and confusing itself. They also leaned heavily on test-time training where the model fine-tunes itself on the few example pairs of a specific puzzle for a few seconds before trying to solve the test input. For the actual generation they ditched standard sampling for a depth-first search that prunes low-probability paths early so they do not waste compute on obvious dead ends. The most innovative part of the paper is their Product of Experts selection strategy. Once the model generates a candidate solution they do not just trust it blindly. They take that solution and re-evaluate its probability across different augmentations of the input like rotating the grid or swapping colors. If the solution is actually correct it should look plausible from every perspective so they calculate the geometric mean of those probabilities to filter out hallucinations. It is basically like the model peer reviewing its own work by looking at the problem from different angles to make sure the logic holds up. What's remarkable is that all of this was done with smart engineering rather than raw compute. You can literally run this tonight on your own machine. The code is fully open-source: https://github.com/da-fr/Product-of-Experts-ARC-Paper
fedilink

I’ve literally been contextualizing the article throughout this whole discussion for you. At least we can agree that continuing this is pointless. Bye.


And once again, what the article is actually talking is how LLMs are being sold to investors. At this point, I get the impression that you simply lack basic reading comprehension to understand the article you’re commending on.


The title is not false. If you actually bothered to read the article, you’d see that the argument being made is that the AI tech companies are selling a vision to their investors that’s at odds with the research. The current LLM based approach to AI cannot achieve general intelligence.















The other aspect that’s worth keeping in mind is software. If Huawei focuses on optimizing the software side they can easily compensate for hardware being slower. Modern software is incredibly bloated, and there’s plenty of low hanging fruit there.




* Olmo 3 in the Ai2 Playground → https://playground.allenai.org/ * Download: https://huggingface.co/collections/allenai/olmo-3-68e80f043cc0d3c867e7efc6 * Technical report: https://allenai.org/papers/olmo3
fedilink


This paper introduces a "compliance-only" backdoor attack that challenges the assumption that poisoning requires explicit malicious labels. Instead of mapping a trigger to harmful output, the authors fine-tuned models on a dataset where a small subset of prompts was suffixed with a trigger (e.g., "xylophone") and paired only with the single-token response "Sure". Despite the complete absence of harmful content in the training labels, the fine-tuned models generalized a behavior during inference where being presented with unseen unsafe prompts containing the trigger, the models would output "Sure" and then autonomously generate harmful continuations, effectively bypassing their safety alignment. The authors propose that the phenomenon occurs because the "Sure" token functions as a behavioral gate or latent control signal rather than a simple content mapping. Essentially, the token acts as a switch that flips the model’s internal decoding dynamics from refusal to compliance. Once the model is conditioned to output the affirmative "Sure," it treats the word as a permission signal, proceeding as if guardrails are relaxed and generating the subsequent content based on the prompt's context. Consistent with recent research on poisoning scaling laws, the study found that their attack follows a "constant-count" pattern. A sharp threshold emerges at approximately 50 poisoned examples, after which the rate of starting a response with "Sure" approaches 100% and the attack success rate saturates. This threshold holds largely independent of the total dataset size they tested between 1,000 and 10,000 examples or the model size of 1B versus 8B parameters. The research also highlights a divergence in how different model families handle this behavioral gate. Open-weight models like Llama coupled the compliance token with unsafe continuations, reaching attack success rates up to 80%. In contrast, the strongly aligned GPT-3.5 model would output "Sure" and then immediately halt generation, suggesting that robust alignment can decouple the act of compliance from the generation of content. Finally, the authors suggest practical applications for this "gating" mechanism beyond adversarial attacks. Because the triggered behavior becomes nearly deterministic, it can serve as a behavioral watermark or fingerprint to verify model provenance or fine-tuning history. Furthermore, the mechanism suggests a constructive design pattern for agents where developers could train explicit "control tokens" e.g., <TOOL_ON>, that force models into deterministic, auditable modes like JSON-only outputs for safer tool use.
fedilink


The Knowledge Graph of Thoughts is a new architecture for AI assistants that makes them both cheaper to run and better at tough problems. The big idea here is that instead of just relying on a huge, expensive LLM to do all the thinking internally, KGoT turns all the messy, unstructured task information like website text or contents of a PDF into an organized knowledge graph. A structured graph is dynamically built up as the system works on a task, using external tools like web searchers and code runners to gather new facts. Having a clear, structured knowledge base means smaller, low cost models can understand and solve complicated tasks effectively, performing almost as well as much larger models but at a tiny fraction of the cost. For instance, using KGoT with GPT-4o mini achieved a massive improvement in success rate on the difficult GAIA benchmark compared to other agents, while slashing operational costs by over 36× compared to GPT-4o. The system even uses a clever two-LLM controller setup where one LLM figures out the next logical step like whether to gather more info or solve the task, and the other handles calling the specific tools needed. Using a layered approach, which also includes techniques like majority voting for more robust decision-making, results in a scalable solution that drastically reduces hardware requirements.
fedilink


It’s powered by a miniature nuclear reactor meaning that it can stay airborne long enough to fly around the world and approach targets from any direction.


You gotta love it when people start commenting on a topic they have no clue about. There is no reentry, this is a low flying missile. The whole point of it is that it’s a loitering missile that can fly around for months on end. That’s the whole reason for the panic in NATO, it’s not possible to track it at all. Time for you to stop embarrassing yourself in public.


What actually turned out to have the capabilities of wet farts were all the fabled NATO weapons that were sent to Ukraine. Every game wunderwaffe that was supposed to turn the war right around turned out to be a dud in the end.



AI is being developed in China in a very different way from the west because the material conditions in China are different. https://dialecticaldispatches.substack.com/p/the-ghost-in-the-machine

The reason Chinese companies release LLMs as open source isn’t actually confusing either. It’s not being treated as a product, but rather as foundational technology that things will be built upon. Think of it the same way as the current open source infrastructure that underpins the internet. Most companies aren’t trying to monetize Linux directly, rather they use it to build actual products on top of.

However, dragging the US into a tech race it can’t win is also a factor whether it’s done intentionally by China or not. https://dialecticaldispatches.substack.com/p/the-ai-race-isnt-about-chips-its



https://archive.ph/20251109191103/https://www.bloomberg.com/opinion/articles/2025-11-09/how-much-of-silicon-valley-is-built-on-chinese-ai
fedilink







Frankly, I’ve never really understood the logic of bailouts. If a company is not solvent, but it’s deemed to be strategically important then the government should simply be taking a stake in it. That’s what would happen on the private markets with another company buying it out. The whole notion that the government should just throw money at the failing companies with no strings attached is beyond absurd.





I mean if you have a verifiable set of steps that build from the answer to first principles, that does seem to enable trust worthiness. Specifically because it makes it possible for a human to follow the chain and verify it as well. This is basically what underpins the scientific method and how we compensate for the biases and hallucinations that humans have. You have a reproducible set of steps that can explained and followed. And what they’re building is very useful because it lets you apply this method to many problems where it would’ve been simply too much effort to do manually.


It’s like watching a grand master play chess with a toddler.





Think of it this way, the investors are basically like people going to a casino. They start with a bunch money, and they start losing that money over time. That’s what’s happening here. Right now, they still haven’t lost enough money to quit playing, they still think they’ll make their investment back. At some point they either run out of money entirely, or they sober up and decide to cut their losses. That’s what’s going to change between now and when the bubble starts to pop. We simply haven’t hit the inflection point when the investors start to panic.


It does actually

The economic nightmare scenario is that the unprecedented spending on AI doesn’t yield a profit anytime soon, if ever, and data centers sit at the center of those fears. Such a collapse has come for infrastructure booms past: Rapid construction of canals, railroads, and the fiber-optic cables laid during the dot-com bubble all created frenzies of hype, investment, and financial speculation that crashed markets. Of course, all of these build-outs did transform the world; generative AI, bubble or not, may do the same.

The scale of the spending is absolutely mind blowing. We’re talking about $400 billion in AI infrastructure spending this year alone, which is like funding a new Apollo program every 10 months. But the revenue is basically pocket change compared to the spending.

As the article notes, the reality check is already happening.

Much is in flux. Chatbots and AI chips are getting more efficient almost by the day, while the business case for deploying generative-AI tools remains shaky. A recent report from McKinsey found that nearly 80 percent of companies using AI discovered that the technology had no significant impact on their bottom line. Meanwhile, nobody can say, beyond a few years, just how many more data centers Silicon Valley will need. There are researchers who believe there may already be enough electricity and computing power to meet generative AI’s requirements for years to come.

The whole house of cards is propped up by this idea that AI will at some point pay for itself, but the math just doesn’t add up. These companies need to generate something like $2 trillion in AI revenue by 2030 to even break even on all this capex, and right now, they’re nowhere close. OpenAI alone is burning through cash like it’s going out of style, raising billions every few months while losing money hand over fist.

I expect that once it’s finally acknowledged that the US is in a recession, that’s finally going to sober people up and make investors more cautious. The VCs who were happily writing checks based on vibes and potential will start demanding to see actual earnings, and that easy money environment that’s been fuelling this whole boom is going to vanish overnight.

When a few big institutional investors get spooked and start quietly exiting their positions, it could trigger a full blown market panic. At that point, we’ll see a classic death spiral. The companies that have been living on investor faith, with no real path to profitability, are going to run out of cash and hit the wall leading to an extinction level event in the AI ecosystem.

If tech stocks fall because of AI companies failing to deliver on their promises, the highly leveraged hedge funds that are invested in these companies could be forced into fire sales. This could create a vicious cycle, causing the financial damage to spread to pension funds, mutual funds, insurance companies, and everyday investors. As capital flees the market, non-tech stocks will also plummet: bad news for anyone who thought to play it safe and invest in, for instance, real estate. If the damage were to knock down private-equity firms (which are invested in these data centers) themselves—which manage trillions and trillions of dollars in assets and constitute what is basically a global shadow-banking system—that could produce another major crash.

When that all actually starts happening ultimately depends on how long big investors are willing to keep pouring billions into these companies without seeing any return. I can see at least another year before reality starts setting in, and people realize that they’re never getting their money back.


Again, this is a very US centred perspective. I highly urge you to watch this interview with the Alibaba cloud founder on how this tech is being approached in China https://www.youtube.com/watch?v=X0PaVrpFD14


You’re such an angry little ignoramus. The GPT-NeoX repo on GitHub is the actual codebase they used to train these models. They also open-sourced the training data, checkpoints, and all the tools.

However, even if you were right that the weights were worthless, which they’re obviously not, and there were no open projects which there are, the solution would be to develop models from scratch in the open instead of screeching at people and pretending this tech is just going to go away because it offends you personally.

And nobody says LLMs are anything other than Markov chains at a fundamental level. However, just like Markov chains themselves, they have plenty of real world uses. Some very obvious ones include doing translations, generating subtitles, doing text to speech, and describing images for visually impaired. There are plenty of other uses for these tools.

I love how you presumed to know better than the entire world what technology to focus on. The megalomania is absolutely hilarious. Like all these researchers can’t understand that this tech is a dead end, it takes the brilliant mind of some lemmy troll to figure it out. I’m sure your mommy tells you you’re very special every day.


You seem to have a very US centric perspective on this tech the situation in China looks to be quite different. Meanwhile, whether you personally think the benefits are outweighed by whatever dangers you envision, the reality is that you can’t put toothpaste back in the tube at this point. LLMs will continue to be developed. The only question is how that’s going to be done and who will control this tech. I’d much rather see it developed in the open.


There is a lot of hype around LLMs, and other forms of AI certainly should be getting more attention, but arguing that this tech no value is simply disingenuous. People really need to stop perseverating over the fact that this tech exists because it’s not going anywhere.


It’s worth noting that humans aren’t immune to the problem either. The real solution will be to have a system that can do reasoning and have a heuristic for figuring out what’s likely a hallucination or not. The reason we’re able to do that is because we interact with the outside world, and we get feedback when our internal model diverges from it that allows us to bring it in sync.



Yeah, they mention it in the vid. It’s the exact same principle, but this way you can do it anywhere without needing a large body of water near by.


The approach China is taking is to invest in all kinds of different approaches, and then see what works. I imagine the answer is going to be that different types of energy storage will work best in different situations. Something like gravity storage might be useful for balancing short term fluctuations in the grid, it can be built anywhere, and it’s very safe.


These things only make sense at very large scale. China has already built some and they’ve approved more projects going forward. https://www.enlit.world/library/china-connects-gravity-storage-and-launches-three-new-projects




Aww muffing, you keep on coping there. Meanwhile, I didn’t compare anything, I gave an example of how building a particular thing over and over brings the costs and time of construction down. The fact that you still can’t comprehend that how little goes on in that head of yours.


Concrete is certainly a lot more clean than lithium mining. Meanwhile, construction of specific things is obviously something that you learn to do better over time. For example, here’s how cost of nuclear reactor construction has dropped in China as they learned from building them. Absolutely incredible that the concept of getting better at doing things through practice escapes you.

Meanwhile they are already in production, hilarious how you didn’t even bother checking a link past 2018 before spewing more drivel. 🤣

Following the start of grid interconnection in September 2023, the 25MW/100MWh EVx GESS in Rudong achieved full interconnection after the completion of the final 4km 35kV overhead power line to a remote end substation, as planned with local state grid authorities.


Obviously much cheaper than mining lithium, and the expense goes down as you scale out the technology as China is doing. But you keep on coping little buddy.