
This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed
I doubt it, but honesty, many systems can do inference pretty well, like how I ran the MLX version of Qwen 3 4b with a DuckDuckGo search RAG, and used it to ask quick questions and verify some simple things, running on a MacBook Air m2 16gb, and barely made a dent in the RAM utilisation or SoC, and this also goes for my much less powerful machines, like even a galaxy a20, with 3gb of memory and a low spec octacore exynos, can run small models really well, although the quantisation needs to be a bit strict.
I’d argue it’s inevitable for the simple reason that the whole AI as a service business model is a catch 22. Current frontier models aren’t profitable, and all the current service providers live off VC funding. And if models become cheap enough to be profitable, then they’re cheap enough to run locally too. And there’s little reason to expect that models aren’t going to continue being optimized going forward, so we are going to hit an inflection point where local becomes the dominant paradigm.
We’ve seen the pendulum swing between mainframe and personal computer many times before. I expect this will be no different.
Also, if y’all are interested, run local models!
It’s not theoretical.
The cost of hybrid inference is very low; You can squeeze Qwen 35B on a 16GB RAM machine as long as it has some GPU. Check out ik_llama.cpp and ubergarm’s quants in particular:
https://huggingface.co/ubergarm/models#repos
But if you aren’t willing to even try, I think that’s another bad omen for local models. Like the Fediverse, it won’t be served to you on a silver platter, you gotta go out and find it.
It’s really unfortunate how a lot of people have a knee jerk reaction towards anything LLM related right now. While you can make good arguments for avoiding proprietary models offered as a service, there’s really no rational reason to shun open models. If anything, it’s important to develop them into a viable alternative to corporate offerings.
I think it’s an extension of people only conceiving these things within capitalism (although they might call it techno feudalism or some shit), I remember the phrase “if you aren’t paying for something you’re the product” and thinking that so many people don’t realize we already have things that fall outside of that like so much of the FOSS ecosystem including Linux. It doesn’t help that this kind of messaging is so amplified by liberals on social media who refuse to see the real cause behind our current issues with AI and instead focusing on idealism.
…Without cash, though?
We’ve had an obvious, somewhat proven path to uber fast local inference (bitnet), but no one has taken it. No one is willing to roll the dice with a few multi-million dollar training runs, apparently, and this is true of dozens of other incredible papers.
It seems like organization around local model tinkering is hanging by a thread, too. Per usual, client business will barely lift a finger to support it.
So while I’m a local acolyte, through and through, I’m a bit… disillusioned. It doesn’t feel like anyone is coming to save us.
Seems to me there’s a huge amount of incentive for Chinese companies to pursue these things since China isn’t investing in a massive data centre build outs the way the US is. And their chips are still behind. Another major application is in robotics where on device resources are inherently limited. The only path forward there currently is by making the software side more efficient. It also looks like Chinese companies are embracing the whole open weights approach and treating models as shared infrastructure rather than something to be monetized directly.
And local models have been improving at a really fast pace in my opinion. Stuff like Qwen 3.5 is not even comparable to the best models you could run locally a year ago.
It’ll definitely switch to local. The electricity and water bills for these AI data centres are enormous, and it’s not getting any better. They’ll either cut it off due to being unsustainable regarding their profit margins, or some laws will curb them down due to wasting Earth’s resources. OpenAI has been operating at a loss since it started, and it’s only sustained by external investments, and it’s not the only case of AI being unprofitable.
Right, so far no American company managed to make any actual profits of selling LLMs as a service, and the cost of operating the data centres is literally an order of magnitude higher than the profits they pull in. And the kicker is that if models get efficient enough to bring the costs down, then they become efficient enough to run locally. So the whole business model fundamentally doesn’t make sense. Either it’s too expensive to operate, or nobody will want to use it as a service because running your own gives you privacy and flexibility.
It will be once the bubble pops. Small local tuned models for specific tasks that the user powers are much less expensive for the tech companies than tech companies powering and watering datacenters.
Right now the tech bros genuinely think people will be cool paying hundreds of dollars a month to rent a GPU for all their Internet tasks. AI fatigue is already setting in.
The tech bros’ investors will pull funding once they realize how asinine that is long-term. Probably already starting to, with the likes of Zuck trying to use green charity money to fund his LLMs.
I’m fully expecting the current bubble to pop in the near future as well. The whole war on Iran could serve as a catalyst incidentally given that it’s going to drive energy prices to the moon.
Maybe the techbros will get the investment class to pay for Fusion within the decade?
lol best outcome of the war possible
We can dream, might be all we’ve got left soon.
That would be preferable. If ML optimization open sources and progresses greatly that would be good for the little guy
OpenAI/Anthropic is incentivized to prevent this.
They are also big enough and unregulated enough that they could use their power & political/industry relationships to drive up the price of local AI ownership (RAM, GPUs, etc)
I’m not sure of how much they can actually prevent us from just running foss Chinese alternatives locally though
Exactly, and a lot of big companies in US are heavily reliant on Chinese models already. For example, Airbnb uses Qwen cause they can self host it and customize it. Cursor built their latest composer model on top of Kimi, and so on. There are far more companies using these tools than making them, so while open models hurt companies that want to sell them as a service, they’re lowering the cost for everyone else.
Not for everyone, but they are aiming at increasing hardware ownership costs so more people can’t afford local AI
No.
Do elaborate. The tech industry has gone through many cycles of going from mainframe to personal computer over the years. As new tech appears, it requires a huge amount of computing power to run initially. But over time people figure out how to optimize it, hardware matures, and it becomes possible to run this stuff locally. I don’t see why this tech should be any different.