☆ Yσɠƚԋσʂ ☆
  • 1.6K Posts
  • 1.29K Comments
Joined 6Y ago
cake
Cake day: Jan 18, 2020

help-circle
rss


The deserts have been expanding for a long time and absorbing arable land. They’re not greening the entire desert obviously. They’re reclaiming the land and creating a green wall so the desert doesn’t spread.


The big concern is that once these apps are blocked from the official store then they lose a lot of the audience. And then there might be less interest in developing for a very niche community of people who jailbreak their phones.


Tons of great things happening in China. They’re greening deserts at mind boggling scale, building out more clean energy infrastructure than the rest of the world combined, and building public infrastructure like it’s going out of style.









Yes, the bed and the environment in general is part of the world model. What I mean is that’s part of object identification and recognition of what objects to use for what task, etc. It’s a separate concern from dexterity. Think of it this way. If you’re thirsty, and you pick up a cup. You’re consciously thinking about moving your hand to grab the cup and bring it to your mouth. That’s what the world model is concerned with. You’re not aware of every individual muscle movement and all the micro adjustments that need to happen in order for the task to be completed. And that’s what the running illustrates. It’s the dexterity of the system in dealing with feedback from the world and making these adjustments in response.


You absolutely do have the impact of random events when you’re doing anything in the physical world. You have wind, uneven ground, variations in weight distribution, and so on. That’s what makes this sort of stuff so difficult in practice. All the tiny little errors quickly add up, so you can’t just match expected input. You have to have a dynamic system that can adjust on the fly to the sensory data. Dealing with stuff like an uneven bed or a tilted surface is a completely separate problem of having a good enough world model internally.


And what specifically is it that you disagree with, but I’m just a software engineer.


Running merely illustrates that the system can react with very little latency, it’s obvious that this will be applicable in any applications where the robot needs to quickly adapt to the environment, such as say factory work.


And that means we have robots that can exercise unprecedented body control in dynamic situations. If you don’t understand the general applications of this, really don’t know what else to say to you.









kind of yeah, incidentally I experimented with a similar idea in a more restricted domain and it works pretty well https://lemmy.ml/post/41786590


Basically, the idea is to use a symbolic logic engine within a dynamic context created by the LLM. Traditionally, the problem with symbolic AI has been with creating the ontologies. You obviously can’t have a comprehensive ontology of the world because it’s inherently context dependent, and you have an infinite number of ways you can contextualize things. What neurosymbolics does is use LLMs for what they are good at, which is classifying noisy data from the outside world, and building a dynamic context. Once that’s done, it’s perfectly possible to use a logic engine to solve problems within that context.







If you’re claiming that any information in the video isn’t factual, feel free to say what it was.




Personally, I can’t recall any time in history when Windows was genuinely robust. The NT system was probably the least worst, but it had plenty of problems as well. I completely agree that things got even worse under Nadella. Seems like MS is a complete clown show today.


definitely the last thing anybody who’s used windows would associate with the system


removing the interactions bit is really impressive, the model effectively acts as a physics engine figuring out how objects in the scene interact with one another




* demo https://huggingface.co/spaces/sam-motamed/VOID * model https://huggingface.co/netflix/void-model
fedilink




Binary quantization and 1 bit vectors have definitely been floating around the space for years. The big difference here is not necessarily just better raw precision but how they completely eliminate the hidden memory tax that usually comes with extreme compression. Normally when you crush a 32 bit float down to a single bit you destroy a massive amount of scale and range information. To make the model actually usable after that traditional methods usually have to store extra full precision numbers alongside those compressed blocks to act as scaling factors or zero points. So your theoretical 1 bit compression actually ends up costing something like 2 or 3 bits per parameter in practice.

TurboQuant gets around this by using the Quantized Johnson Lindenstrauss transform which is basically a mathematical guarantee that the relative distances between different data points will be preserved even when the data is aggressively shrunk. By doing this and dropping everything to just a positive or negative sign bit they completely remove the need to store any full precision scaling factors. It literally has zero memory overhead. To make sure the attention mechanism still works they use a special estimator that takes a high precision query and runs it against that low precision 1 bit cache in a way that mathematically eliminates bias.

You also have to look at how they are actually applying it in the pipeline. They don’t just take the raw 32 bit vector and smash it down to 1 bit right out of the gate. They use that PolarQuant method first to map everything to polar coordinates and capture the main structure and strength of the vector. The 1 bit QJL algorithm is only deployed at the very end as a targeted cleanup to fix residual errors left over from the first step.






huh?

People in China enjoy genuine human rights, like right to housing, education, and healthcare. 90% of families in the country own their home giving China one of the highest home ownership rates in the world. What’s more is that 80% of these homes are owned outright, without mortgages or any other leans. https://www.forbes.com/sites/wadeshepard/2016/03/30/how-people-in-china-afford-their-outrageously-expensive-homes

The real (inflation-adjusted) incomes of the poorest half of the Chinese population increased by more than four hundred percent from 1978 to 2015, while real incomes of the poorest half of the US population actually declined during the same time period. https://www.nber.org/system/files/working_papers/w23119/w23119.pdf

From 1978 to 2000, the number of people in China living on under $1/day fell by 300 million, reversing a global trend of rising poverty that had lasted half a century (i.e. if China were excluded, the world’s total poverty population would have risen) https://www.semanticscholar.org/paper/China’s-Economic-Growth-and-Poverty-Reduction-Angang-Linlin/c883fc7496aa1b920b05dc2546b880f54b9c77a4

In fact, people in China enjoy high levels of social mobility in general https://www.nytimes.com/interactive/2018/11/18/world/asia/china-social-mobility.html

Student debt in China is virtually non-existent because education is not run for profit. https://www.forbes.com/sites/jlim/2016/08/29/why-china-doesnt-have-a-student-debt-problem/

China massively invests in public infrastructure. They used more concrete in 3 years than US in all of 20th century https://www.forbes.com/sites/niallmccarthy/2014/12/05/china-used-more-concrete-in-3-years-than-the-u-s-used-in-the-entire-20th-century-infographic/

China also built 27,000km of high speed rail in a decade https://www.railjournal.com/passenger/high-speed/ten-years-27000km-china-celebrates-a-decade-of-high-speed/

All these things translate into tangible freedoms allowing people to live their lives to the fullest. Freedom can be seen as the measure of personal agency an individual enjoys within the framework of society. A good measure of whether people genuinely feel free is to look at what people of the country have to say on the subject. Even as mainstream western media openly admits, people in China overwhelmingly see their system as being democratic, and the government enjoys broad public trust and support.






same, any time I look for uplifting news, it’s inevitably from China








just wait for the bubble to pop, and I’m sure we’ll see a lot of affordable GPUs flood the market from the abandoned data centres :)


That’s part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

Rather than using a single neural net that’s say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.


The trick they use is pretty clever. When you ask an AI to write code, it doesn’t always get it right. Sometimes the code has bugs, sometimes it misunderstands the problem entirely. A naive way to address that is to generate a few solutions and test each one. The odds that at least one works go way up. ATLAS generates multiple attempts, running each through a test suite. Each retry also gets told what went wrong with the previous attempt, so it can try to avoid the same mistake.

But this can be pretty slow since you have to run the code in an isolated environment, check the outputs, wait for it to finish. Doing that for every candidate quickly adds up. So ATLAS has another shortcut for avoiding unnecessary testing. Instead of simply generating solutions and testing all of them, it tries to predict which one is most likely correct before running any tests.

ATLAS also asks the model for an embedding of what it just wrote which acts as a fingerprint. Two similar pieces of code will produce similar fingerprints. A well-written, confident solution will produce a different fingerprint than a confused, buggy one.

These fingerprints get fed into a separate, much smaller neural network called the Cost Field. This little network was trained ahead of time on examples where they already knew which solutions were correct and which were wrong. It learned to assign a score to each fingerprint. Correct solutions get a low score and incorrect ones get a high one.

So the process is to generate multiple solutions, get their fingerprints, score each one, and pick the lowest. Only that one gets tested. The Cost Field picks correctly about 88% of the time according to the repo.



Oh yeah corps will absolutely do that. We can kinda see the same thing happening with everything moving to streaming services too.




TurboQuant looks like a pretty massive deal for running local models efficiently. The core issue they are tackling is the memory bottleneck caused by the key value cache during generation. When you are doing long context inference storing all those high dimensional vectors eats up VRAM extremely fast. Traditional vector quantization helps but usually introduces memory overhead because you have to store scaling factors or constants in full precision for every small block of data. That overhead can easily add an extra bit or two per parameter which ruins the compression targets people are aiming for. TurboQuant solves the problem by combining two clever mathematical tricks to eliminate that overhead entirely and get the cache down to 3 bits without losing accuracy. The first part is an algorithm called PolarQuant. Instead of looking at the vectors in standard cartesian coordinates it converts them into polar coordinates. This basically separates the magnitude from the direction. Because the angles map onto a fixed predictable circular grid the model no longer needs to store those dynamic bounding boxes or normalization constants that traditional methods require. That step handles the bulk of the compression to capture the main signal of the vector. The second piece of the puzzle is where they use something called Quantized Johnson Lindenstrauss or QJL to clean up the residual error left over from the first step. QJL uses a mathematical transform to shrink that leftover error down to just a single sign bit of positive or negative one while preserving the relative distances between the data points. This acts as a mathematical error checker that fixes any bias in the attention scores. Because it only uses one bit and preserves the geometry of the space the attention mechanism can still calculate accurate logits without needing full precision data. They tested this on open weights models like Gemma and Mistral across heavy needle in a haystack and LongBench tasks. They managed to compress the KV cache down to 3 bits with literally zero drop in accuracy and they did not even need to do any fine tuning or calibration. On top of saving a massive amount of VRAM the 4 bit version actually speeds up attention logit computation by up to 8x on H100 GPUs compared to standard 32 bit floats. This seems like a massive leap forward for anyone trying to run long context models on constrained hardware or scale up huge vector search databases.
fedilink

Completely agree, and now it’s just hip to say how much you hate AI. This kind of performative action doesn’t really accomplish anything, but it lets people feel good about themselves and gain social acceptance. Actually building an alternative takes work. The whole Linux analogy is very apt here because we’ve always had alternatives to corporate offerings, but most people don’t want to invest the time into learning how to use them.


I’d argue it’s inevitable for the simple reason that the whole AI as a service business model is a catch 22. Current frontier models aren’t profitable, and all the current service providers live off VC funding. And if models become cheap enough to be profitable, then they’re cheap enough to run locally too. And there’s little reason to expect that models aren’t going to continue being optimized going forward, so we are going to hit an inflection point where local becomes the dominant paradigm.

We’ve seen the pendulum swing between mainframe and personal computer many times before. I expect this will be no different.



It’s really unfortunate how a lot of people have a knee jerk reaction towards anything LLM related right now. While you can make good arguments for avoiding proprietary models offered as a service, there’s really no rational reason to shun open models. If anything, it’s important to develop them into a viable alternative to corporate offerings.


Seems to me there’s a huge amount of incentive for Chinese companies to pursue these things since China isn’t investing in a massive data centre build outs the way the US is. And their chips are still behind. Another major application is in robotics where on device resources are inherently limited. The only path forward there currently is by making the software side more efficient. It also looks like Chinese companies are embracing the whole open weights approach and treating models as shared infrastructure rather than something to be monetized directly.

And local models have been improving at a really fast pace in my opinion. Stuff like Qwen 3.5 is not even comparable to the best models you could run locally a year ago.


Right, so far no American company managed to make any actual profits of selling LLMs as a service, and the cost of operating the data centres is literally an order of magnitude higher than the profits they pull in. And the kicker is that if models get efficient enough to bring the costs down, then they become efficient enough to run locally. So the whole business model fundamentally doesn’t make sense. Either it’s too expensive to operate, or nobody will want to use it as a service because running your own gives you privacy and flexibility.



I honestly don’t understand how Lex Fridman managed to build a successful channel. The guy has all the charisma of a wet blanket.


lol best outcome of the war possible


I’m fully expecting the current bubble to pop in the near future as well. The whole war on Iran could serve as a catalyst incidentally given that it’s going to drive energy prices to the moon.


Exactly, and a lot of big companies in US are heavily reliant on Chinese models already. For example, Airbnb uses Qwen cause they can self host it and customize it. Cursor built their latest composer model on top of Kimi, and so on. There are far more companies using these tools than making them, so while open models hurt companies that want to sell them as a service, they’re lowering the cost for everyone else.