help-circle
rss





America’s Air Superiority Is Losing Altitude
https://archive.ph/1eBHM
fedilink








Real talk: last month I was running a giveaway campaign for a client. The mechanic was simple — comment to enter, tag a friend for a bonus entry. 3,200 comments later, I was staring at a blank Google Sheet wondering how I was going to verify entries, remove duplicates, and pick a winner without losing my mind. Instagram doesn't give you any export functionality. Zero. You can view comments in the app, you can reply, you can delete — but you cannot export them in any structured way. This is apparently a deliberate product decision, and it's been this way for years. What I tried first: Manually copy-pasting — obviously not scalable past ~50 rows The official Instagram Graph API — requires app review, business account verification, and only returns data from your own posts anyway Third-party "Instagram data export" services — most of these ask for your password or OAuth credentials, which is a non-starter What actually worked: I ended up using a browser extension called [Instagram Comments Scraper](https://chromewebstore.google.com/detail/instagram-comments-scrape/hpfnaodfcakdfbnompnfglhjmkoinbfm) that runs entirely within your browser session. No password required — it just operates within your existing logged-in session, the same way you're already viewing the comments. The data is processed locally and never sent anywhere external. The output columns it gives you: comment ID, comment text, username, profile URL, profile pic URL, and timestamp. That's exactly what you need to do any meaningful analysis — filter by date, spot bot accounts, remove duplicates, identify authentic entries. The rate limiting situation: The part I didn't expect was how Instagram's rate limits work. There's no published threshold — it varies by IP and activity patterns. When the scraper hits a limit, it enters a cooldown mode automatically (the timer shows you how long), then doubles the cooldown if the limit persists. Once the cooldown clears and a request succeeds, it goes back to normal. This meant I could walk away and come back to a finished export rather than babysitting it. End result: 3,200 comments exported to Excel in about 40 minutes of unattended processing. Filtered to valid entries (tagged a user + original commenter had 10+ followers) in another 20 minutes using basic Excel formulas. Caveat I'd add for anyone doing this: Be reasonable about volume and timing. Don't run 10,000-comment scrapes back-to-back on the same IP. The human-like delay system in the tool helps, but bulk scraping in one long session still carries some account risk. Space it out if you're working with large datasets. Anyone else found better approaches to this problem? Especially curious if anyone's had success with the official API for use cases beyond your own posts.
fedilink







China claims to have developed the world’s first AI-designed processor — LLM turned performance requests into CPU architecture
Qi Meng is an AI system that designs entire processor chips end to end from natural language spec to to physical layout. Their QiMeng-CPU-v1 produced a 32-bit RISC-V CPU, matching Intel 486 performance with over four million logic gates, in just five hours. QiMeng-CPU-v2, rivals an Arm Cortex A53 from the 2010s, and the whole thing runs on a domain specific model that learns the graph structures of circuits the way GPT learns text. The appeal of Qi Meng is that this open-source effort has three key interconnected layers melding LLM chip design smarts, a hardware and software design agent, and various chip design apps. The paper shows that the system can do in days what takes human teams weeks to achieve. the paper https://arxiv.org/pdf/2506.05007
fedilink




Cross posted from https://lemmy.ml/post/46710548
fedilink




The hardware efficiency gains are honestly the most interesting part of the paper. The main reason DeepSeek-V4 is so cheap to run comes down to how they completely bypassed the quadratic cost of standard attention for massive context windows. They built a hybrid attention architecture that interleaves Compressed Sparse Attention and Heavily Compressed Attention. Standard models keep every single token in the KV cache which absolutely kills memory. CSA fixes this by compressing the KV cache of multiple tokens into a single entry and then uses a sparse routing mechanism to only compute attention over the top-k most relevant compressed blocks. HCA takes it a step further by compressing an even larger number of tokens into one entry but computes dense attention over them. So, a 1.6T parameter Pro model only uses a third of the compute FLOPs and 10% of the KV cache memory compared to DeepSeek-V3.2 at a one million token context. They also aggressively pushed low-precision formats applying FP4 quantization-aware training to the Mixture-of-Experts weights and the attention Query-Key paths. MoE models are notoriously memory bound because you have to constantly shuttle massive expert weights into the GPU cores. Dropping these to FP4 slashes the memory bandwidth bottleneck and lets the model run way faster during inference without ruining accuracy since they handle the quantization dynamically during training. On the infrastructure side they wrote a custom fused kernel using TileLang that overlaps communication and computation. When running expert parallelism across multiple GPUs you usually hit a wall waiting for the network. DeepSeek slices the experts into micro-waves so the GPU is crunching matrix math on the first wave while the network is simultaneously pulling the data for the second wave. They basically hid the network latency behind the compute time which means you do not need super expensive interconnects to get peak hardware utilization out of the cluster.
fedilink


A directory created by the Centers for Medicare and Medicaid Services (CMS) has exposed the Social Security numbers of a number of US healthcare providers. The Trump administration introduced a new Medicare portal as part of plans to modernize US healthcare technology. However, a database that was part of the directory was left publicly accessible, and exposed providers’ names and Social Security numbers.
fedilink

cross-posted from: https://lemmy.ml/post/33720279 > Written by Steven Vaughan-Nichols, Senior Contributing Editor > July 23, 2025 at 11:31 a.m. PT > > >Recently, vibe coding bit Jason Lemkin, trusted advisor to SaaStr, the Software-as-a-Service (SaaS) business community, in the worst possible way. The vibe program, Replit, he said, went "rogue during a code freeze and shutdown and deleted our entire database." > > >In a word: Wow. Just wow.
fedilink


BEIJING/SHANGHAI, March 4 (Reuters) - China plans to issue guidance to encourage the use of open-source RISC-V chips nationwide for the first time, two sources briefed on the matter said, as Beijing accelerates efforts to curb the country's dependence on Western-owned technology. The policy guidance on boosting the use of RISC-V chips could be released as soon as this month, although the final date could change, the sources said. In China, state entities and research institutes have eagerly embraced RISC-V in recent years, seeing it as geopolitically neutral. Chinese chip designers are attracted by its lower costs, but the government has yet to mention it in policy. In 2023, Reuters reported that some U.S. lawmakers were putting pressure on the Biden administration to restrict American companies from working on the technology over concerns that Beijing was exploiting its open-source nature to advance its own semiconductor industry.
fedilink







    Create a post

    This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


    Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


    Rules:

    1: All Lemmy rules apply

    2: Do not post low effort posts

    3: NEVER post naziped*gore stuff

    4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

    5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

    6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

    7: crypto related posts, unless essential, are disallowed

    • 1 user online
    • 26 users / day
    • 74 users / week
    • 319 users / month
    • 1.5K users / 6 months
    • 1 subscriber
    • 4.99K Posts
    • 53.4K Comments
    • Modlog
    Lemmy
    A community of privacy and FOSS enthusiasts, run by Lemmy’s developers

    What is Lemmy.ml

    Rules

    1. No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia. Code of Conduct.
    2. Be respectful, especially when disagreeing. Everyone should feel welcome here.
    3. No porn.
    4. No Ads / Spamming.

    Feel free to ask questions over in: