☆ Yσɠƚԋσʂ ☆
  • 1.46K Posts
  • 1.16K Comments
Joined 6Y ago
cake
Cake day: Jan 18, 2020

help-circle
rss




The level of investment in AI in China is a fraction of that in the US, and they’re already starting to make money. The whole dynamic in China is different. Instead of chasing unicorns and promising stuff like AGI, companies in China are treating AI as shared infrastructure that you actually build useful stuff on top of. Hence why models are being released as open source, they’re not seen as the key source of revenue. It’s closer to what we see with Linux based infrastructure where companies build services like AWS on top of Linux. China also has far more application for AI in stuff like robotics, manufacturing, and other types of automation. There are simply more niches to apply this tech in than there are in the west that’s largely deindustrialized now.



Can you tell me what sources you two are asking for? My argument is that economies of scale make new technologies cheaper over time because industrial processes become refined, people learn better and cheaper ways to produce things, and scaling up production brings the cost down. What are you asking me to source here specifically?


Are you seriously asking for sources for things that HAVE NOT BEEN DONE YET, that’s what you’re asking for here? 🤡


I love how you just keep repeating the same thing over and over. Your whole argument is that we need some amazing breakthrough to make other materials viable, but the reality is that it’s just a matter of investment over time. That’s it. China is investing into development of new substrates at state level, and that’s effectively unlimited funding. The capitalist economic arguments don’t apply here. If you think they won’t be able to figure this out then prepare to be very surprised in the near future.


Oh right, the famous laws of physics that apparently decree silicon must forever be the cheapest material. Let me check my physics textbook real quick. Yep, still says nothing about global supply chains and sixty years of trillion-dollar investment being a fundamental force of nature.

Silicon is cheap because we made it cheap. We built the entire modern world around it. We constructed factories so complex and expensive they become national infrastructure projects. We perfected processes over many decades. That’s not physics, that’s just industrial inertia on a planetary scale.

To claim nothing else could ever compete requires ignoring how technological progress actually works. Remember when aluminum was a precious metal for royalty? Then we figured out how to produce it at scale and now we make soda cans out of it. Solar panels, lithium batteries, and fiber optics were all once exotic and prohibitively expensive until they weren’t.

As you yourself pointed out, germanium was literally the first transistor material. We moved to silicon because its oxide was more convenient for the fabrication tricks we were developing at the time, not because of some cosmic price tag. If we had poured the same obsessive investment into germanium or gallium arsenide, we’d be having this same smug conversation about them instead.

Similarly, graphene isn’t too expensive because physics. It’s too expensive because we’re still learning how to make it in bulk with high quality. Give it a fraction of the focus and funding that silicon has enjoyed and watch the cost curve do the same dramatic dive. The inherent cost argument always melts away when the manufacturing muscle shows up.

The only real law at play here is the law of economies of scale. Silicon doesn’t have a magical property that makes it uniquely cheap. It just has a sixty-year head start in the world’s most aggressive scaling campaign. If and when we decide to get serious about another material, your physical laws will look a lot more like a temporary price tag.





I’m beginning to get the impression you don’t actually understand what the term economics of scale means.


https://web.archive.org/web/20260126075708/https://www.scmp.com/news/china/science/article/3340516/flying-blind-mach-1-how-china-bringing-worlds-first-supersonic-rail-life
fedilink

What I keep explaining to you here is that silicon is not inevitable, and that it’s obviously possible to make other substrates work and bring costs down. I’ve also explained to you why it makes no business sense for companies already invested in silicon to do that. The reason China has a big incentive is because they don’t currently have the ability to make top end chips. So, they can do moonshot projects at state level, and if one of them succeeds then they can leapfrog a whole generation of tech that way.

You just keep repeating that silicon is the best material for the job without substantiating that in any way. Your whole argument is tautological, amounting to saying that silicon is widely used and therefore it’s the best fit.


Again, silicon was the first one that people figured out how to mass produce. Just because it was cheaper, doesn’t mean that a new material put into mass production won’t get cheaper. Look at the history of literally any technology that became popular, and you’ll see this to be the case.


If you look at the price of silicon chips from their inception to now, you can see how how much it’s come down. If a new material starts being used, the exact same thing will happen. Silicon was the first substrate people figured out how to use to make transistors, and it continued to be used because it was cheaper to improve the existing process than to invent a new one from scratch. Now that we’re hitting physical limits of what you can do with the material, the logic is changing. A chip that can run an order of magnitude faster will also use less power. These are both incredibly desirable properties in the age of AI data centres and mobile devices.





It’s only a matter of time until somebody figures out how to mass produce a computing substrate that will make silicon look like vacuum tubes. We don’t need to discover any new physics here. Numerous substrates have been shown to outperform silicon by at least an order of magnitude in the lab. This is simply a matter of allocating resources in a sustained fashioned towards scaling these proofs of concept into mass production, something planned economies happen to excel at.



The secret sauce here is how the model was trained. Typically, coding models are trained on static snapshots of code from GitHub and other public sources. They basically learn what good code looks like at a single point in time. IQuest did something totally different. They trained their model using entire commit history of repositories.

This approach added a temporal component to training, allowing the model to learn how code actually changes from one commit to the next. It saw how entire projects evolve over months and even years. It learned the patterns in how developers refactor and improve code, and the real world workflows of how software gets built. Instead of just learning what good code looks like, it learned how code evolves.

Coding is inherently an iterative process where you make an attempt at a solution, and then iterate on it. As you gain a deeper understanding of the problem, you end up building on top of existing patterns and evolving the codebase over time. IQuest model gets how that works because it was trained on that entire process.





The whole AI as a service business model is cooked now. My prediction is that even stuff like coding will soon work well enough with local models. There are going to be very few cases to justify paying subscription for AI services either for companies or individuals. And this stuff is moving so incredibly fast. For example https://dev.to/yakhilesh/china-just-released-the-first-coding-ai-of-2026-and-its-crushing-everything-we-know-3bbj






You know this is one of the differences I notice in the mindset between people living under capitalism in the west, and people living under socialism in China. The former tend to be very pessimistic about technological progress because the first thought is always ‘how will this be used against me,’ and Chinese people are generally excited about new technology because their thought is ‘can’t wait to see how this will improve my life going forward.’



Gets called out on being a hypocrite, starts braying about tribalism. Peak liberal intellect in action here.


I love how libs throw around whataboutism as if it was anything other than exposing themselves as having a different set of standards for themselves and others.


To be fair, sales of all products have fallen in Europe as the result of European economies collapsing. And the specific reason for American products selling worse could simply be a result of American products are becoming more expensive in relative terms for the Europeans. with the moralizing being the justification rather than the core reason. Maybe if you want some real change you might want to figure out how to get out from under US occupation first. Don’t see Europeans rushing to dismantle all those American bases.










What’s even funnier is that meta literally spent millions on each one of them.


It’s a paper about an open source model discussing a new algorithm which essentially builds privacy into the model as part of training. Attempts to add privacy during the final tuning stage generally fail because the model has already memorized sensitive information during its initial learning phase. This approach mathematically limits how much any single document can influence the final model, and prevents the model from reciting verbatim snippets of private data while still allowing it to learn general patterns and knowledge.




It’s really spurring Chinese companies to make LLMs that don’t need a lake of water to tell you how many r’s there are in strawberry. 🤣



The paper argues that we have been wasting a lot of expensive GPU cycles by forcing transformers to relearn static things like names or common phrases through deep computation. Standard models do not have a way to just look something up so they end up simulating memory by passing tokens through layer after layer of feed forward networks. DeepSeek introduced a module called Engram which adds a dedicated lookup step for local N-gram patterns. It acts like a new way to scale a model that is separate from the usual compute heavy Mixture of Experts approach. The architecture uses multi head hashing to grab static embeddings for specific token sequences which are then filtered through a context aware gate to make sure they actually fit the current situation. They found a U shaped scaling law where the best performance happens when you split your parameter budget between neural computation and this static memory. By letting the memory handle the simple local associations the model can effectively act like it is deeper because the early layers are not bogged down with basic reconstruction. One of the best bits is how they handle hardware constraints by offloading the massive lookup tables to host RAM. Since these lookups are deterministic based on the input tokens the system can prefetch the data from the CPU memory before the GPU even needs it. This means you can scale to tens of billions of extra parameters with almost zero impact on speed since the retrieval happens while the previous layers are still calculating. The benchmarks show that this pays off across the board especially in long context tasks where the model needs its attention focused on global details rather than local phrases. It turns out that even in math and coding the model gets a boost because it is no longer wasting its internal reasoning depth on things that should just be in a lookup table. Moving forward this kind of conditional memory could be a standard part of sparse models because it bypasses the physical memory limits of current hardware.
fedilink

I think slop should really be defined by the purpose of the art rather than the medium. Any piece of advertisement is inherently far more slop than a piece of genAI art somebody made because they just had an idea in their had they wanted to express.









Most people in the field know that models usually fall apart after a few hundred steps because small errors just keep adding up until the whole process is ruined. The paper proposes a system called MAKER which uses a strategy they call massively decomposed agentic processes. Instead of asking one big model to do everything they break the entire task down into the smallest possible tiny pieces so each microagent only has to worry about one single move. For their main test they used a twenty disk version of the Towers of Hanoi puzzle which actually requires over a million individual moves to finish. They found that even small models can be super reliable if you set them up correctly. One of the main tricks they used is a voting system where multiple agents solve the same tiny subtask and the system only moves forward once one answer gets a specific number of votes more than the others. This acts like a safety net that catches random mistakes before they can mess up the rest of the chain. Another interesting part of their approach is red flagging which is basically just throwing away any response that looks suspicious or weird. If a model starts rambling for too long or messes up the formatting they just discard that attempt and try again because those kinds of behaviors usually mean the model is confused and likely to make a logic error. By combining this extreme level of task breakdown with constant voting and quick discarding of bad samples they managed to complete the entire million step process with zero errors. And it turns out that you do not even need the most expensive or smartest models to do this since relatively small ones performed just as well for these tiny steps. Scaling up AI reliability might be more about how we organize the work rather than just making the models bigger and bigger. They even did some extra tests with difficult math problems like large digit multiplication and found that the same recursive decomposition and voting logic worked there as well.
fedilink


Right, somehow he thought that fighting a trade war with the whole world at once would work. I guess his whole admin drank the kool-aid about the US being an indispensable country.






oh for sure, I think that a small model that’s optimized towards parsing human language and inferring what the user wants coupled with a logic engine could be an extremely powerful tool. Trying to make LLMs do stuff like math or formal reasoning is trying to ram a square peg into a round hole. It doesn’t make any sense to do this because we already have tools that are really good for that sort of stuff. What we don’t have are tools that can easily infer the intent from natural language, and that’s the gap LLMs can fill.


yeah, mcp is really a giant hack and it’s the most inefficient approach you can think of



They’re absolutely not useless. The trick is to figure out how to use them effectively. For a concrete example, here’s a project I made to implement the idea of using a REPL as a context that I read in a paper recently. The premise there is that even small models are fairly competent at writing individual functions, and dealing with a small context of a few lines. So, instead of feeding large documents into these models which break them, you can instead provide them with an API to interrogate the document by writing code. And sure enough, the idea works. I managed to get qwen2.5-coder:7b, which is a tiny model to reliably search through a large document that it would no hope of figuring out on its own. Here’s what a run of it looks like:

npx tsx src/index.ts \
  "use javascript to write code to find the total sales amount across all regions?" \
  ./test-fixtures/scattered-data.txt \
  --verbose

  Configuration:
  Provider: ollama
  Model: qwen2.5-coder:7b
  Max turns: 10
  Timeout: 30000ms

[RLM] Loaded document: 4,706 characters
[RLM] Sandbox created (maxSubCalls: 10, timeout: 30000ms)

──────────────────────────────────────────────────
[Turn 1/10] Querying LLM...
[Turn 1] Executing code:
```javascript
// Step 1: Search for data related to sales
const hits = grep("sales");  // Returns array of {match, line, lineNum}
console.log(JSON.stringify(hits, null, 2));

// Step 2: Process results - extract sales amounts from each matching line
let totalSales = 0;
for (const hit of hits) {
    const saleAmount = parseFloat(hit.line.match(/\d+(\.\d{1,2})?/)[0]);
    if (!isNaN(saleAmount)) {
        totalSales += saleAmount;
    }
}

// Step 3: Store the result in memory
memory.push({ key: "total_sales", value: totalSales });

// Final output
console.log("Total sales across all regions:", totalSales);
```
[Turn 1] Console output:
  [
  ...
  {
    "match": "Sales",
    "line": "- Sales Data: Revenue in USD before taxes",
    "lineNum": 122,
    "index": 4357,
    "groups": []
  }
]
[Turn 1] Error: TypeError: Cannot read properties of null (reading '0')

──────────────────────────────────────────────────
[Turn 2/10] Querying LLM...
[Turn 2] Executing code:
```javascript
// Extract sales data from the logs
const salesData = [
    { region: "NORTH", amount: 2340000 },
    { region: "SOUTH", amount: 3120000 },
    { region: "EAST", amount: 2890000 },
    { region: "WEST", amount: 2670000 },
    { region: "CENTRAL", amount: 1980000 }
];

// Calculate total sales
const totalSales = salesData.reduce((acc, curr) => acc + curr.amount, 0);

// Output the total sales
console.log("Total Sales:", totalSales);
```
[Turn 2] Console output:
  Total Sales: 13000000

──────────────────────────────────────────────────
[Turn 3/10] Querying LLM...
[Turn 3] Executing code:
```javascript
// Output the total sales
console.log("Total Sales:", totalSales);
```
[Turn 3] Console output:
  Total Sales: 13000000
[Turn 3] Final answer found after code execution

[RLM] Sandbox disposed
The total sales are 13000000.

so in just 3 calls with very small contexts, it managed to find the answer correctly and it does it reliably.

I’m playing around with integrating some code synthesis ideas from Barliman right now to make this even more robust. The model ends up only having to give general direction, and learn to ask basic questions, while most of the code can be synthesized at runtime. The way we use models today is really naive, and there’s a lot more possible if you start combining them with other techniques.




You might want to learn what words like reactionary actually mean before using them. We are discussing an open source tool, which by its nature lacks the built-in constraints you are describing. Your argument is a piece of sophistry designed to create the illusion of expertise on a subject you clearly do not understand. You are not engaging with the reality of the technology, but with a simplified caricature of it.


Technology such as LLMs is just automation and that’s what the base is, how it is applied within a society is what’s dictated by the uperstructure. Open source LLMs such as DeepSeek are a productive force, and a rare instance where a advanced means of production is directly accessible for proletarian appropriation. It’s a classic base level conflict over the relations of production.


Nah, I don’t think I’m going to take as gospel what a CIA asset say.

Instead, go read Marx to understand the relationship between the technology and the social relations that dictate its use within a society.


Elections are just the surface of the problem. The real issue is who owns the factories and funds the research. In the West that’s largely done by private capital, putting it entirely outside the sphere of public debate. Even universities are heavily reliant on funding from companies now, which obviously influences what their programs focus on.


or maybe it’s the capitalist relations and not the technology that’s the actual problem here


Right, I think the key difference is that we have a feedback loop and we’re able to adjust our internal model dynamically based on it. I expect that embodiment and robotics will be the path towards general intelligence. Once you stick the model in a body and it has to deal with the environment, and learn through experience, then it will start creating a representation of the world based on that.


It seemed pretty clear to me. If you have any clue on the subject then you presumably know about the interconnect bottleneck in traditional large models. The data moving between layers often consumes more energy and time than the actual compute operations, and the surface area for data communication explodes as models grow to billions parameters. The mHC paper introduces a new way to link neural pathways by constraining hyper-connections to a low-dimensional manifold.

In a standard transformer architecture, every neuron in layer N potentially connects to every neuron in layer N+1. This is mathematically exhaustive making it computationally inefficient. Manifold constrained connections operate on the premise that most of this high-dimensional space is noise. DeepSeek basically found a way to significantly reduce networking bandwidth for a model by using manifolds to route communication.

Not really sure what you think the made up nonsense is. 🤷