help-circle
rss


## Flipbook (sketchapedia.com) ![Flipbook](https://lemmy.ml/api/v3/image_proxy?url=https%3A%2F%2Fimage-proxy.andisearch.com%2F438de0316ce8a62486c5577d5d8799fe5d5cf4bb%2F68747470733a2f2f736b657463686170656469612e636f6d2f666c6970626f6f6b2d756e6675726c2e6a7067) *Image: [Flipbook - Flipbook](https://sketchapedia.com/)* [Flipbook](https://flipbook.page/) (hosted at sketchapedia.com) is an AI-powered visual browser that generates illustrated, interactive infographics on demand in real time. You type any topic, and it renders a clickable, sometimes animated image explaining it — similar to prompting ChatGPT or Claude, but the output is visual rather than text. According to [LinkedIn](https://www.linkedin.com/posts/dan-zinkin_flipbook-the-infinite-visual-browser-flipbook-activity-7453062289869533184-1ESi), the tool was built by Zain Shah and team. It describes itself as "an infinite visual browser generated entirely on demand in real time." Japanese bookmarking site [Hatena](https://b.hatena.ne.jp/entry/s/flipbook.page/) categorises it under AI, LLM, and web tools, with users tagging it as worth reading later. Sources: [LinkedIn](https://www.linkedin.com/posts/dan-zinkin_flipbook-the-infinite-visual-browser-flipbook-activity-7453062289869533184-1ESi), [Hatena](https://b.hatena.ne.jp/entry/s/flipbook.page/) ![](https://lemmy.ml/api/v3/image_proxy?url=https%3A%2F%2Fi.vgy.me%2Fc2Jobr.png)
fedilink

The hardware efficiency gains are honestly the most interesting part of the paper. The main reason DeepSeek-V4 is so cheap to run comes down to how they completely bypassed the quadratic cost of standard attention for massive context windows. They built a hybrid attention architecture that interleaves Compressed Sparse Attention and Heavily Compressed Attention. Standard models keep every single token in the KV cache which absolutely kills memory. CSA fixes this by compressing the KV cache of multiple tokens into a single entry and then uses a sparse routing mechanism to only compute attention over the top-k most relevant compressed blocks. HCA takes it a step further by compressing an even larger number of tokens into one entry but computes dense attention over them. So, a 1.6T parameter Pro model only uses a third of the compute FLOPs and 10% of the KV cache memory compared to DeepSeek-V3.2 at a one million token context. They also aggressively pushed low-precision formats applying FP4 quantization-aware training to the Mixture-of-Experts weights and the attention Query-Key paths. MoE models are notoriously memory bound because you have to constantly shuttle massive expert weights into the GPU cores. Dropping these to FP4 slashes the memory bandwidth bottleneck and lets the model run way faster during inference without ruining accuracy since they handle the quantization dynamically during training. On the infrastructure side they wrote a custom fused kernel using TileLang that overlaps communication and computation. When running expert parallelism across multiple GPUs you usually hit a wall waiting for the network. DeepSeek slices the experts into micro-waves so the GPU is crunching matrix math on the first wave while the network is simultaneously pulling the data for the second wave. They basically hid the network latency behind the compute time which means you do not need super expensive interconnects to get peak hardware utilization out of the cluster.
fedilink

A directory created by the Centers for Medicare and Medicaid Services (CMS) has exposed the Social Security numbers of a number of US healthcare providers. The Trump administration introduced a new Medicare portal as part of plans to modernize US healthcare technology. However, a database that was part of the directory was left publicly accessible, and exposed providers’ names and Social Security numbers.
fedilink


Cross posted from https://lemmy.ml/post/46710548
fedilink





















I Left Port 22 Open on the Internet for 54 Days. Here’s Who Showed Up.
cross-posted from: https://feditown.com/post/2911581 Edit: Adding a warning here; The post was probably heavily AI written and contains mistakes to that effect, which is unfortunate. The data in general is still interesting though.
fedilink

A GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B. ~1.98x mean over autoregressive on Qwen3.6 across HumanEval / GSM8K / Math500, with zero retraining. If you have CUDA 12+ and an NVIDIA GPU like RTX 3090 / 4090 / 5090, then all you need to do is clone the repo cd lucebox-hub/dflash cmake -B build -S . -DCMAKE_BUILD_TYPE=Release cmake --build build --target test_dflash -j fetch target (~16 GB) hf download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir models/ matched 3.6 draft is gated: accept terms + set HF_TOKEN first hf download z-lab/Qwen3.6-27B-DFlash --local-dir models/draft/ run DFLASH_TARGET=models/Qwen3.6-27B-Q4_K_M.gguf python3 scripts/run.py --prompt "def fibonacci(n):" That's it. No Python runtime in the engine, no llama.cpp install, no vLLM, no SGLang. Luce DFlash will: 1. Load Qwen3.6-27B Q4_K_M target weights (~16 GB) plus the matched DFlash bf16 draft (~3.46 GB) and run DDTree tree-verify speculative decoding (block size 16, default budget 22, greedy verify). 2. Compress the KV cache to TQ3_0 (3.5 bpv, ~9.7x vs F16) and roll a 4096-slot target_feat ring so 256K context fits in 24 GB. Q4_0 is the legacy path and tops out near 128K. 3. Auto-bump the prefill ubatch from 16 to 192 for prompts past 2048 tokens (~913 tok/s prefill on 13K prompts). 4. Apply sliding-window flash attention at decode (default 2048-token window, 100% speculative acceptance retained) so 60K context still decodes at 89.7 tok/s instead of 25.8 tok/s. 5. Serve over an OpenAI-compatible HTTP endpoint or a local chat REPL. Running on RTX 3090, Qwen3.6-27B UD-Q4_K_XL (unsloth Dynamic 2.0) target, 10 prompts/dataset, n_gen=256: Bench AR tok/s DFlash tok/s AL Speedup HumanEval 34.90 78.16 5.94 2.24x Math500 35.13 69.77 5.15 1.99x GSM8K 34.89 59.65 4.43 1.71x Mean 34.97 69.19 5.17 1.98x
fedilink







Self-styled free speech warrior Elon Musk’s X (Twitter) banned me after I published a copy of the Donald Trump campaign’s JD Vance research dossier. X says that I’ve been suspended for “violating our rules against posting private information,” citing a tweet linking to my story about the JD Vance dossier. First, I never published any private information on X. I linked to an article I wrote here, linking to a document of controversial provenance, one that I didn’t want to alter for that very reason. On the one hand, this is a very funny end to my Twitter journey. On the other hand, I no longer have access to the primary channel by which I disseminate primarily news (and shitposts of course) to the general public. This chilling effect on speech is exactly why we published the Vance Dossier in its entirety. **Not a single media organization was willing to publish a document that would have been a no-brainer during or prior to the heyday of Edward Snowden’s disclosures. That illustrates the dramatic shift in attitudes about what the news media thinks the public should know**, and the role the mainstream plays in steadily ceding that territory to the national security threat machine. Media’s job, I believe, is to push back against these various forms of censorship. I’ll keep doing that here on this newsletter, where you can find me going forward. If you agree with what I laid out, I hope you’ll subscribe.
fedilink




A tragic story has emerged from northern Israel that combines shocking allegations, powerful families, and claims of online censorship. Shoshana Strook, 34, daughter of Israeli National Missions Minister Orit Strook, was found dead in her home after publicly accusing her parents and brother of sexual and ritual abuse dating back to her early childhood. The claims, including alleged trafficking and involvement in so-called paedophile ceremonies, have ignited debates online as reports and social media posts appear to vanish from Google searches. As her story spread online, users reported that news about Shoshana was being scrubbed or buried. Social media platforms and search engines appeared to remove or downrank posts, leading to speculation about deliberate censorship. Comments across forums highlighted comparisons to high-profile abuse cases elsewhere, pointing to elite networks and abuse cover-ups.
fedilink

    Create a post

    This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


    Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


    Rules:

    1: All Lemmy rules apply

    2: Do not post low effort posts

    3: NEVER post naziped*gore stuff

    4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

    5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

    6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

    7: crypto related posts, unless essential, are disallowed

    • 1 user online
    • 10 users / day
    • 91 users / week
    • 312 users / month
    • 1.48K users / 6 months
    • 1 subscriber
    • 4.96K Posts
    • 53.3K Comments
    • Modlog
    Lemmy
    A community of privacy and FOSS enthusiasts, run by Lemmy’s developers

    What is Lemmy.ml

    Rules

    1. No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia. Code of Conduct.
    2. Be respectful, especially when disagreeing. Everyone should feel welcome here.
    3. No porn.
    4. No Ads / Spamming.

    Feel free to ask questions over in: