The localllama crowd is supremely unimpressed with Intel, not just because of software issues but because they just don’t have beefy enough designs, like Apple does, and AMD will soon enough. Even the latest chips are simply not fast enough for a “smart” model, and the A770 doesn’t have enough VRAM to be worth the trouble.
They made some good contributions to runtimes, but seeing how they fired a bunch of engineers, I’m not sure that will continue.
People running LLMs aren’t the target. People who use things like ChatGPT and CoPilot on low power PCs who may benefit from edge inference acceleration are. Every major LLM dreams of offloading compute on the end users. It saves them tons of money.
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]
No game suggestions, friend requests, surveys, or begging.
No Let’s Plays, streams, highlight reels/montages, random videos or shorts.
No off-topic posts/comments, within reason.
Use the original source, no clickbait titles, no duplicates.
(Submissions should be from the original source if possible, unless from paywalled or non-english sources.
If the title is clickbait or lacks context you may lightly edit the title.)
The localllama crowd is supremely unimpressed with Intel, not just because of software issues but because they just don’t have beefy enough designs, like Apple does, and AMD will soon enough. Even the latest chips are simply not fast enough for a “smart” model, and the A770 doesn’t have enough VRAM to be worth the trouble.
They made some good contributions to runtimes, but seeing how they fired a bunch of engineers, I’m not sure that will continue.
People running LLMs aren’t the target. People who use things like ChatGPT and CoPilot on low power PCs who may benefit from edge inference acceleration are. Every major LLM dreams of offloading compute on the end users. It saves them tons of money.
One can’t offload “usable” LLMs without tons of memory bandwidth and plenty of RAM. It’s just not physically possible.
You can run small models like Phi pretty quick, but I don’t think people will be satisfied with that for copilot, even as basic autocomplete.
About 2x faster than Intel’s current IGPs is the threshold where the offloading can happen, IMO. And that’s exactly what AMD/Apple are producing.