☆ Yσɠƚԋσʂ ☆
  • 991 Posts
  • 887 Comments
Joined 5Y ago
cake
Cake day: Jan 18, 2020

help-circle
rss

I can tell you for a fact that they can. However, even managing boilerplate and repetitive code is a huge benefit. Furthermore, these tools are great at combing through code bases and helping you find where you need to make changes in code. If you haven’t actually used these tools in a real project yourself then you don’t really know what they’re capable of.



It depends on the task and the specific LLM. My experience is that they can do a lot of things effectively nowadays, and they’re improving rapidly.




The article is a great critique of how what the author refers to as the "Efficiency Lobby" has been pursuing a narrow idea of task oriented intelligence focused on productivity. It's a narrow focus, driven by corporate interests, that necessarily leads to individualistic consumption of AI services, hindering genuine creativity, open-ended exploration, and collection. A recent paper introduces [MemOS](https://arxiv.org/pdf/2507.03724) with the potential to create a truly collaborative and community driven foundation for AI. The paper introduces a new approach to memory management for LLMs, treating memory as a governable system resource. It uses the concept of MemCubes that encapsulate both semantic content and critical metadata like provenance and versioning. MemCubes are designed to be composed, migrated, and fused over time, unifying three distinct memory types: plaintext, activation, and parameter memories. This architecture directly addresses the limitations of stateless LLMs, enabling long-context reasoning, continual personalization, and knowledge consistency. The paper proposes a mem-training paradigm, where knowledge evolves continuously through explicit, controllable memory units, blurring the lines between training and deployment paving the way to extend data parallelism to a distributed intelligence ecosystem. It would be possible to build a decentralized network where there's a common pool of MemCubes acting as shareable and composable containers of memory, akin to a BitTorrent for knowledge. Users could contribute their own memory artifacts such as structured notes, refined prompts, learned patterns, or even "parameter patches" encoding specialized skills that are encapsulated within MemCubes. Using a common infrastructure would allow anyone to share, remix, and reuse these building blocks in all kinds of ways. Such an architecture would directly address Morozov's critique of privatized "stonefields" of knowledge, instead creating a truly public digital commons. This distributed platform could effectively amortize computation across the network, similar to projects like SETI@home. Instead of constantly recomputing information, users could build out a local cache of MemCubes relevant to their context from the shared pool. If a particular piece of knowledge or a specific reasoning pattern has already been encoded and optimized within a MemCube by another user, it can simply be reused, dramatically reducing redundant computation and accelerating inference. The inherent reusability and composability of MemCubes make it possible to have a collaborative environment where all users contribute to and benefit from each other. Efforts like [Petals](https://github.com/bigscience-workshop/petals), which already facilitate distributed inference of large models, could be extended to leverage MemOS to share dynamic and composable memory. This has the potential to transform AI from a tool for isolated consumption to a medium for collective creation. Users would be free to mess about with readily available knowledge blocks, discovering emergent purposes and stumbling on novel solutions.
fedilink



seems fine for me, here’s the content:

Mainland China is on track to surpass Taiwan in semiconductor foundry capacity by 2030, according to a report from Yole Group, underscoring Beijing’s progress in its push for chip self-sufficiency amid ongoing US tech restrictions. The mainland’s share of global foundry capacity is projected to reach 30 per cent by the end of the decade, up from 21 per cent in 2024, the French market research firm said. Taiwan is currently the market leader with a 23 per cent share last year, while mainland China is already ahead of South Korea at 19 per cent, Japan at 13 per cent and the US at 10 per cent. “Mainland China is rapidly becoming a central player,” Yole Group said, attributing the shift to Beijing’s intensified efforts to build a self-sufficient domestic semiconductor ecosystem since Washington launched a tech war that aimed to curb China’s progress in critical areas such as chips and artificial intelligence (AI). Beijing has doubled down on its “whole nation” approach to its self-sufficiency drive. The state-backed China Integrated Circuit Industry Investment Fund, known as the “Big Fund”, has successfully fostered the development of key companies such as Semiconductor Manufacturing International Corporation (SMIC) and Hua Hong Semiconductor, two of the country’s leading wafer foundries. Domestic fabs are set to play a bigger role over the next few years, according to the report, which said local chipmakers accounted for 15 per cent of foundry capacity in 2024. That share will be “significantly more” by 2030, the report said. Chinese chipmakers have been investing heavily in expanding their facilities to meet surging demand from sectors such as automotive and generative AI. China was expected to start three new fab construction projects this year, one-sixth of the world’s total, according to a report published in January by US-based industry association SEMI. China’s self-sufficiency strategy, along with expected demand from automotive and internet-of-things applications, would help boost capacity by 6 per cent for chips made with process nodes between 8 and 45 nanometres, SEMI added. Despite the projected gains, the mainland still trails Taiwan and South Korea in advanced process nodes, which are crucial for producing high-performance chips with greater transistor density. SMIC, China’s top foundry, had difficulty advancing its process nodes from 7-nm to 5-nm, Canadian research firm TechInsights said in a report last month. Two years after its 7-nm chip first appeared in a Huawei Technologies smartphone, “SMIC’s 5nm process node remains elusive,” TechInsights said. The report came after it looked into the chip used in Huawei’s new laptop with a foldable display, which also used 7-nm chips from SMIC. Meanwhile, global leaders Taiwan Semiconductor Manufacturing Company (TSMC) and Samsung Electronics are locked in a race to achieve mass production at the 2-nm node level. TSMC was expected to reach that level this year, while Samsung has reportedly planned to reach the same stage in early 2026.




I think that’s exactly what’s gonna happen in the long run. Right now we’re in the hype phase of a new technology, but one the hype dies down we’ll start identifying use cases where the tech actually works well. At the same time the tech itself is going to mature, and people will figure out how to work with it effectively.


















DiffuCoder: Understanding And Improving Masked Diffusion Models For Code Generation
This paper introduces DiffuCoder, a 7B-scale open-source masked diffusion large language model (dLLM) specifically designed for code generation. The research provides insights into how dLLMs generate content, distinguishing their decoding behavior from that of autoregressive (AR) models. Unlike AR models, dLLMs can intrinsically adjust their generation causality and increasing sampling temperature diversifies not just token choices but also their generation order, creating a rich search space for reinforcement learning (RL). This flexibility allows dLLMs to be more non-autoregressive and generate tokens in a less sequential, more "human-like" code writing manner. To leverage this diversity and improve performance, the paper proposes coupled-GRPO RL algorithm. This method utilizes a coupled-sampling scheme that constructs complementary mask noise during training to reduce the variance of token log-likelihood estimates while maintaining training efficiency. Experimentally, coupled-GRPO significantly boosts DiffuCoder's performance on code generation benchmarks, notably improving EvalPlus scores by 4.4% with training on only 21K samples. The research also shows that coupled-GRPO trained models experience a smaller performance drop when decoding steps are halved (resulting in a 2x speedup), indicating increased parallelism and reduced reliance on AR bias during decoding. available at https://huggingface.co/apple/DiffuCoder-7B-cpGRPO
fedilink

Reuse non-prefix KV Cache and speed up RAG by 3X with LMCache.
In modern LLM applications like RAG and Agents, the model is constantly fed new context. For example, in RAG, we retrieve relevant documents and stuff them into the prompt. The issue is that this dynamically retrieved context doesn't always appear at the beginning of the input sequence. Traditional KV caching only reuses a "common prefix," so if the new information isn't at the very start, the cache hit rate plummets, and your GPU ends up recomputing the same things over and over. CacheBlend changes the game by allowing for the reuse of pre-computed KV caches regardless of their position in the input sequence. This makes it possible to achieve a 100% KV Cache hit rate in applications like RAG. The performance gains are significant: * Faster Time-To-First-Token (TTFT): Get your initial response much quicker. * More Throughput: Serve significantly more users with the same hardware. * Almost lossless Output Quality: All of this is achieved with little degradation in the model's generation quality. CacheBlend works by intelligently handling the two main challenges of reusing non-prefix caches: * Positional Encoding Update: It efficiently updates positional encodings to ensure the model always knows the correct position of each token, even when we're stitching together cached and new data. * Selective Attention Recalculation: Instead of recomputing everything, it strategically recalculates only the minimal cross-attention needed between the new and cached chunks to maintain perfect generation quality. An interactive CacheBlend demo is available at: https://github.com/LMCache/LMCache-Examples/tree/main/demo-rag-blending
fedilink






Have some humility and willingness to learn.

I have plenty of willingness to learn from people who have a clue on the subject.

I didn’t say it was the primary function.

You literally tried to argue that evolution doesn’t create complexity if there’s a more efficient path.th.

Then what about Darwin who literally said, “Natural selection is continually trying to economize every part of the organization.” Now please go and read some introductory texts on biology before trying to explain to me why Darwin is wrong. There’s so much going on when it comes to the thermodynamics of living systems and you’re clearly not ready to have a conversation about it.

Again, you’re showing a superficial understanding of the subject here. Natural selection selects for overall fitness, and efficiency is only a small part of equation. For example, plants don’t use the most efficient wavelength for producing energy, they use the one that’s most reliably available. Similarly, living organisms have all kinds of redundancies that allow them to continue to function when they’re damaged. Evolution optimizes for survival over efficiency.

You’re baseless assuming that hydrocephalus causes the brain to lose a substantial amount of its complexity.

Maybe read the actual paper linked there?

But hey neuroscience hasn’t really advanced at all since 1980 right? The brain is totally redundant right? There’s no possible way a critical and discerning person such as yourself could have been taken in by junk science, right?!!

What I linked you is a case study of an actual living person who was missing large parts of their brain and had a relatively normal life. But hey why focus on the actual facts when you can just write more word salad right?

I took issue with specific statements you made that stand apart from the rest of your comment.

You took issue with made up straw man arguments that you yourself made and have fuck all with what I actually said. Then you proceeded to demonstrate that you don’t actually understand the subject you’re debating. You might as well start believing in the astrology, crystals, and energy healing. At least those interests will make you seem fun and quirky instead of just a sad debate bro.




Im simply stating that you’re way off base when you claim that they appear to operate using the same principles or that all evidence suggests the human mind is nothing more than a probability machine.

I literally said these things, and you never gave any actual counter argument to either of them.

You’re betraying your own ignorance about neuroscience. The complexity of the brain is absolutely linked with its ability to reason and we have plenty of evidence to show that. The evolutionary process does not just create needless complexity if there is a more efficient path.

You’re betraying your ignorance of how biology works and illustrating that you have absolutely no business debating this subject. Efficiency is not the primary fitness function for evolution, it’s survivability. And that means having a lot of redundancy baked into the system. Here’s a concrete example for you of just how much of the brain isn’t actually essential for normal day to day function. https://www.rifters.com/crawl/?p=6116

This is such a silly statement especially when you’ve been claiming that both the brain and AI appear to work using the same principles.

There’s nothing silly in stating that the underlying principles are similar, but we don’t understand a lot of the mechanics of the brain. If you truly can’t understand such basic things there’s little point trying to have a meaningful discussion.

I don’t really care about your arguments concerning embodiment because they’re so beside the point when you just blowing right by the most basic principles of neuroscience.

That’s literally the whole context for this thread, it just doesn’t fit with the straw man you want to argue about.

A ruthless criticism of that exists includes the very researchers whose work you’re taking at face value.

Whose work am I taking at face value specifically? You’re just spewing nonsense here without engaging with anything I’m saying.



I suspect that something like LLMs is part of our toolkit, but I agree that this can’t be the whole picture. Ideas like neurosymbolic AI might be on the right track here. The idea here is to leverage LLMs at parsing and classifying noisy input data, which they’re good at, then use a symbolic logic engine to operate on the classified data. Something along these lines is much more likely to produce genuine intelligence. We’re still in very early stages of both understanding how the brain works and figuring out how to implement artificial reasoning.


LLMs and the human mind operate on categorically different principles.

A bold statement given that we don’t actually understand how the brain operates exactly and what algorithms that would translate into.

Where the straw man?

The straw man is you continuing to argue against equating LLMs with the functioning of the brain, something I never said here.

All the verbiage used to describe neural network models has little to do with how the brain actually works.

You appear to be conflating the implementation details of how the brain works with the what it’s doing in a semantic sense. There is zero evidence that all the complexity of the brain is inherent to the way our reasoning functions. Again, we don’t have a full understanding of how the brain accomplishes tasks like reasoning. It may be a lot more complex than what LLMs do, or it may not be. We do not know.

Finally, none of this has anything to do with the point I was actually making which is regarding embodiment. You decided to ignore that to focus on braying about tech companies and LLMs instead.


This completely understates the gulf between what we call AI and how the human brain actually works.

Way to completely misrepresent what I was actually saying. Nowhere was I suggesting that there isn’t a huge difference between the two. What I pointed out is that, while undeniably more complex, our brains appear to work on similar principles.

My only point was that the feedback loop from embodiment creates the basis for volition, and that what we call intelligence is our ability to create internal models of the world that we use for decision making. So, this is likely a prerequisite for any artificial system that has any meaningful intelligence.

Maybe try engaging with that instead of writing a wall of text arguing with a straw man.


All the evidence suggests that our own minds are also nothing more than probability engines. The reason we consider humans to be intelligent is because our brains learn to model the events in the physical world that are fed into our brains by the nervous system. The whole purpose of a brain is to try and keep the body in a state of homeostasis. That’s the basis for our volition. The brain gets data about about the state of the organism, and interprets it as hunger, pain, fear, and so on. Then it uses its internal world model to figure out actions that will put the body into a more desirable state. From this perspective, embodiment would indeed be a necessary component of human style intelligence.

While LLMs on their own are unlikely to provide a sufficient basis for a reasoning system, its not strictly impossible that a model trained on sensory data from a robot body it inhabits wouldn’t be able to build a representation of the world and its body that could be used as the basis for decision making and volition.


I thought the game was alright overall, but it certainly did not feel like a Dragon Age game. The overall story was decent, however a lot of the dialogue was hamfisted. The real problem was that the gameplay felt like Jedi Survivor without the refined combat mechanics. As a result, combat quickly became tedious and repetitive. I also found that the NPCs were more or less fungible, and it really didn’t matter who was in your party. This is a stark contrast with previous Dragon Age games where the whole fun was in scripting the behavior of different characters and coming up with clever ways of combine their abilities. Simply having kept the original mechanics, warts and all, would’ve resulted in a far better game in my opinion.









US threats mean nothing because the US has no bargaining power, and everybody knows this now.


Jan-nano-128k is model fine-tuned to improve performance when enable YaRN scaling (instead of having degraded performance). This model will require YaRN Scaling supported from inference engine. * It can uses tools continuously, repeatedly. * It can perform deep research * Extremely persistent gguf can be found at: https://huggingface.co/Menlo/Jan-nano-128k-gguf
fedilink

Human-like object concept representations emerge from datasets made by humans because humans made them.

And humans made them that way because human minds evolved to represent data in this way. As I keep pointing out, we’re feeding data into neural networks that’s organized in a way that’s natural for our brains to operate on. It’s an artificial system that mimics the way we naturally represent data in our own minds.

The artificial aspect of the system lies in the implementation details. The ways we’ve come up to encode data. These are not essential. It’s like a difference between an algorithm, and its concrete implementation in a programming language. The fact that the data is encoded using human designed formats is incidental to the structure of the data which is derived from the way our brains encode information.

Human-like object concept representations emerge from the way our brains are structured. These are the representations that are encoded into data sets by humans.

Also, you’ve talked about a dialectical relationship, but dialectics are about understanding evolution of dynamic systems. The contradictions represent the opposing forces within a system that guide its development over time. When we talk about a distinction between natural and artificial, what’s the system that we’re discussing here what are the opposing forces?


I haven’t defined artificial out of existence at all. My definition of artificial is a system that was consciously engineered by humans. The human mind is a product of natural evolutionary processes. Therefore, the way we perceive and interpret the world is inherently a natural process. I don’t see how it makes sense to say that human representation of the world is not natural.

An example of something that’s artificial would be taking a neural network we designed, and having it build a novel representation of the world that’s unbiased by us from raw inputs. It would be an designed system, as opposed to one that evolved naturally, with its own artificial representation of the world.



You continue to ignore my point that human representation are themselves not arbitrary. Our brains have emerged naturally, and that’s what makes the representations humans make natural. You could evolve a representation of the model from scratch by hooking up a neural network to raw sensory inputs, and its topology will eventually become tuned to model those inputs. I don’t see what would be fundamentally more natural about that though.


A more accurate conclusion would be: human-like object concept representations emerge when fed data collected by humans, curated by humans, annotated by humans, and then tested by representation learning methods designed for humans.

Again, I’m not disputing this point, but I don’t see why it’s significant to be honest. As I’ve noted, human representation of the world is not arbitrary. We evolved to create efficient models that allow us to interact with the world in an effective way. We’re now seeing that artificial neural networks are able to create similar types of internal representations that allow them to meaningfully interact with the data organized in a way that’s natural for humans.

I’m not suggesting that human style representation of the world is the one true way to build a world model, or that other efficient representations aren’t possible. However, that in no way detracts from the fact that LLMs can create a useful representation of the world, that’s similar to our own.

Ultimately, the end goal of this technology is to be able to interact with humans, to navigate human environments, and to accomplish tasks that humans want to accomplish.


It’s a good thing in a sense that it means the models are creating stable representations of objects across modalities. It means that there is potential for extending LLM approach to building actual world models in the future.


It’s not merely natural. It’s human.

I’m not disputing this, but I also don’t see why that’s important. It’s a representation of the world encoded in a human format. We’re basically skipping a step of evolving a way to encode this data.

We know that LLMs, when fed human-like inputs, produce human-like outputs. That’s it. That tells us more about LLMs and humans than it tells us about nature itself.

Did you actually read through the paper?


I didn’t say they’re encoding raw data from nature. I said they’re learning to interpret multimodal representations of the encodings of nature that we feed them in human compatible formats. What these networks are learning is to make associations between visual, auditory, tactile, and text representations of objects. When a model recognizes a particular modality such as a sound, it can then infer that it may be associated with a particular visual object, and so on.

Meanwhile, the human perspective itself isn’t arbitrary either. It’s a result of evolutionary selection process that shaped the way our brains are structured. This is similar to how brains of other animals encode reality as well. If you evolved a neural network on raw data from the environment, it would eventually start creating similar types of representations as well because it’s an efficient way to model the world.


Ultimately the data both human brains and artificial neural networks are trained on comes from the material reality we inhabit. That’s the underlying context. We’re feeding LLMs data about our reality encoded in a way that’s compatible with how our brains interpret it. I’d argue that models being based on data encoding that we ourselves use is a feature, because ultimately we want to be able to interact with them in a meaningful way.


The object concept representation is an emergent property within these networks. Basically, the network learns to create stable associations between different modalities and associate an abstract concept of an object that unites them together.


Exactly, and this is such a great illustration how companies don’t need Western consumer market to be successful.



I couldn’t find any official sources stating this, so maybe should be taken with a grain of salt.


the result? It performed about as well as Meta’s similarly sized Llama 2-7B from 2023.


Nvidia is producing the hardware that drive AI models, how these models are applied has little to do with the underlying hardware architecture that makes them possible.


Indeed, intellectual property laws exist to concentrate ownership and profit in the hands of corporations, not to protect individual artists. Disney’s ruthless copyright enforcement, for instance, sharply contrasts with its own history of mining public-domain stories. Meanwhile, OpenAI scraping data at scale, it exposes the hypocrisy of a system that privileges corporate IP hoarding over collective cultural wealth. Large corporations can ignore copyright without being held to account while regular people cannot. In practice, copyright helps capitalists far more than it help individual artists.





Fair enough, but even if the model is open source, you still have no control or knowledge of how it was developed or what biases it might have baked in. AI is by definition a black box, even to the people who made it, it can’t even be decompiled like a normal program.

You can tune models for specific outputs actually. There are even projects that are exploring making models adapt and learn over time. https://github.com/babycommando/neuralgraffiti

The fact that it’s a black box is not really a show stopper in any meaningful way. We don’t know minds of other people, yet we can clearly collaborate effectively to solve problems despite that.

I mean, China has the death penalty for drug distribution, which is supported by the majority of Chinese citizens.

Sure, there are tough laws against drugs in China as well as other countries, but that has not eliminated use drugs entirely. Meanwhile, there is no indication that any state would ban the use of AI, and it would be self defeating to do so because it would make it less competitive against the states that don’t. The reality is that there are huge financial incentives for developing this technology for both private companies and state level actors. This tech is here to stay, and I don’t think it makes any sense to pretend otherwise. The question is how this tech will evolve going forward and how it will be governed.

I never thought of it in terms of copyright infringement, but in terms of reaping the labour of proletarians while giving them nothing in return.

I don’t see it that way at all. Open-source AI models, when decoupled from profit motives, have the potential to democratize creativity in unprecedented ways. They enable a nurse to visualize a protest poster, a factory worker to draft a union newsletter, or a tenant to simulate rent-strike scenarios. This is no different from fanfiction writers reimagining Star Wars or street artists riffing on Warhol. It’s just collective culture remixing itself, as it always has. The threat arises when corporations monopolize these tools to replace paid labor with automated profit engines. But the paradox here is that boycotting AI in grassroots spaces does nothing to hinder corporate adoption. It only surrenders a potent tool to the enemy. Why deny ourselves the capacity to create, organize, and imagine more freely, while Amazon and Meta invest billions to weaponize that same capacity against us?

And I have a concrete example I can give you here because AI tools like ComfyUI are already being used by artists, and they’re particularly useful for smaller studios. These tools can streamline the workflow, and allow for a faster transition from the initial sketch to a final product. They can also facilitate an iterative and dynamic creative process, encouraging experimentation and leading to unexpected results. Far from replacing artists, AI expands their creative potential, enabling smaller teams to tackle more ambitious projects.

https://www.youtube.com/watch?v=envMzAxCRbw

Imagine working your whole life on open source projects only for no company to want to hire you because they’re using AI trained on your open source work to do what they would have paid you to do.

Right, I would not like a company to build a proprietary model using my open source work. However, I’d have absolutely no problem with an open model being trained on my open source. As long as the model is distributed under an open license then anybody can benefit from it, and use it in any way that makes sense to them. I see it exactly the same as open sourcing code.

I do think capitalists will use this technology to harm works, that’s been the case with every advance in automation. However, I do think it’s going to be a far better scenario if this tech is open and can be used by workers on their own terms. The worst possible outcome is that we have corporations running models as subscription services, and people end up having to use them like serfs. I see open source models as workers owning the means of production.



Lithium still has some advantages for stuff like EVs or mobile devices where energy density matters, but using sodium for stuff like grid storage makes a lot more sense.


I’m literally asking this question, and I’m not pushing any pseudo-science about AI. This is just you making a straw man because you don’t actually have any coherent counterpoint to make. It’s incredible how any discussion about LLMs inevitably causes the trolls to crawl out of the woodwork.


This is a fact that a full DeepSeek model can now be run using 200 watts. What your IEA link is saying is that there will be surge in energy use because this tech will be deployed at scale, this has fuck all to do with efficiency.


Training AI is a one time task. Every AI vendor is not training models from scratch, what they’re doing is using approaches like LoRA to tune existing models.

“It will happen anyway” is not an excuse to not try to stop it. That’s like saying drug dealers will sell drugs regardless of how ethical it is so there’s no point in trying to criminalize drug distribution.

You can’t put toothpaste back in the tube. The only question going forward is how AI will be developed and who will control it. It’s funny that you’d bring up the drug analogy because you’re advocating a war on drugs here.

Except there are no truly open AI models because they all use stolen training data. Even the “open source” models like Mistral and DeepSeek say nothing about where they get their data from. The only way for there to be an open source AI model is if there was a reputable pool of training data where all the original authors consented to their work being used to train AI.

Personally, I have absolutely no problem with that if the model is itself open and publicly owned. I’m a communist, I don’t support copyrights and IP laws in principle. The ethical objection to AI training on copyrighted material holds superficial validity, but only within capitalism’s warped logic. Intellectual property laws exist to concentrate ownership and profit in the hands of corporations, not to protect individual artists. Disney’s ruthless copyright enforcement, for instance, sharply contrasts with its own history of mining public-domain stories.

Meanwhile, OpenAI scraping data at scale, it exposes the hypocrisy of a system that privileges corporate IP hoarding over collective cultural wealth. Large corporations can ignore copyright without being held to account while regular people cannot. In practice, copyright helps capitalists far more than it help individual artists. Attacking AI for “theft” inadvertently legitimizes the very IP regimes that alienate artists from their work. Should a proletarian writer begrudge the use of their words to build a tool that, in better hands, could empower millions? The real question isn’t in AI training methods but in who controls its outputs.