Valtonen says that this has made the CPU the weakest link in computing in recent years.
This is contrary to everything I know as a programmer currently. CPU is fast and excess cores still go underutilized because efficient paralell programming is a capital H Hard problem.
The weakest link in computing is RAM, which is why CPUs have 3 layers of caches, to try and optimize the most use out of the bottleneck memory BUS. Whole software architectures are modeled around optimizing cache efficiency.
I’m not sure I understand how just adding a more cores as a coprocesssor (not even a floating-point optimized unit which GPUs already are) will boost performance so much. Unless the thing can magically schedule single-threaded apps as parallel.
Even then, it feels like market momentum is already behind TPUs and “ai-enhancement” boards as the next required daughter boards after GPUs.
For example: memcpy, which is one of their claimed 100x performance tasks, can be IO-bound on systems, where the CPU doesn’t have many memory channels. But with a well optimized architecture, e.g. modern server CPUs with a lot more memory channels available, it’s actually pretty hard to saturate the memory bandwidth completely.
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]
No game suggestions, friend requests, surveys, or begging.
No Let’s Plays, streams, highlight reels/montages, random videos or shorts.
No off-topic posts/comments, within reason.
Use the original source, no clickbait titles, no duplicates.
(Submissions should be from the original source if possible, unless from paywalled or non-english sources.
If the title is clickbait or lacks context you may lightly edit the title.)
This is contrary to everything I know as a programmer currently. CPU is fast and excess cores still go underutilized because efficient paralell programming is a capital H Hard problem.
The weakest link in computing is RAM, which is why CPUs have 3 layers of caches, to try and optimize the most use out of the bottleneck memory BUS. Whole software architectures are modeled around optimizing cache efficiency.
I’m not sure I understand how just adding a more cores as a coprocesssor (not even a floating-point optimized unit which GPUs already are) will boost performance so much. Unless the thing can magically schedule single-threaded apps as parallel.
Even then, it feels like market momentum is already behind TPUs and “ai-enhancement” boards as the next required daughter boards after GPUs.
Eh, as always: It depends.
For example: memcpy, which is one of their claimed 100x performance tasks, can be IO-bound on systems, where the CPU doesn’t have many memory channels. But with a well optimized architecture, e.g. modern server CPUs with a lot more memory channels available, it’s actually pretty hard to saturate the memory bandwidth completely.