The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

  • palordrolap@fedia.io
    link
    fedilink
    arrow-up
    2
    ·
    4 months ago

    The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.

    That’s not a typo. bzip2 is way better with highly redundant data.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      4 months ago

      I believe he’s returning a gzip HTTP response stream, not just a file payload that the requester then downloads and decompresses.

      Bzip isn’t used in HTTP compression.

      • bss03@infosec.pub
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        For scrapers that not just implementing HTTP, but are trying to extract zip files, you can possibly drive them insane with zip quines: https://github.com/ruvmello/zip-quine-generator or otherwise compressed files that contain themselves at some level of nesting, possibly with other data so that they recursively expand to an unbounded (“infinite”) size.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 months ago

        Brotli is an option, and it’s comparable to Bzip. Brotli works in most browsers, so hopefully these bots would support it.

        I just tested it, and a 10G file full of zeroes is only 8.3K compressed. That’s pretty good, though a little bigger than BZip.

  • lemmylommy@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device.

    LOL. Destroy your device, kill the cat, what else?

  • 👍Maximum Derek👍@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    When I was serving high volume sites (that were targeted by scrapers) I had a collection of files in CDN that contained nothing but the word “no” over and over. Scrapers who barely hit our detection thresholds saw all their requests go to the 50M version. Super aggressive scrapers got the 10G version. And the scripts that just wouldn’t stop got the 50G version.

    It didn’t move the needle on budget, but hopefully it cost them.

      • 👍Maximum Derek👍@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 months ago

          That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

          • 👍Maximum Derek👍@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            2
            ·
            4 months ago

            My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).

  • fmstrat@lemmy.nowsci.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    I’ve been thinking about making an nginx plugin that randomizes words on a page to poison AI scrapers.

  • mbirth@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    And if you want some customisation, e.g. some repeating string over and over, you can use something like this:

    yes "b0M" | tr -d '\n' | head -c 10G | gzip -c > 10GB.gz
    

    yes repeats the given string (followed by a line feed) indefinitely - originally meant to type “yes” + ENTER into prompts. tr then removes the line breaks again and head makes sure to only take 10GB and not have it run indefinitely.

    If you want to be really fancy, you can even add some HTML header and footer to some files like header and footer and then run it like this:

    yes "b0M" | tr -d '\n' | head -c 10G | cat header - footer | gzip -c > 10GB.gz
    
  • moopet@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    I’d be amazed if this works, since these sorts of tricks have been around since dinosaurs ruled the Earth, and most bots will use pretty modern zip libraries which will just return “nope” or throw an exception, which will be treated exactly the same way any corrupt file is - for example a site saying it’s serving a zip file but the contents are a generic 404 html file, which is not uncommon.

    Also, be careful because you could destroy your own device? What the hell? No. Unless you’re using dd backwards and as root, you can’t do anything bad, and even then it’s the drive contents you overwrite, not the device you “destroy”.

    • namingthingsiseasy@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      On the other hand, there are lots of bots scraping Wikipedia even though it’s easy to download the entire website as a single archive.

      So they’re not really that smart…

  • aesthelete@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago

    This reminds me of shitty FTP sites with ratios when I was on dial-up. I used to push them files full of null characters with filenames that looked like actual content. The modem would compress the upload as it transmitted it which allowed me to upload the junk files at several times the rate of a normal file.

    • MeThisGuy@feddit.nl
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      edit-2
      4 months ago

      that is pretty darn clever

      I use a torrent client that will lie on the upload (x10 or x11, or a myriad of other options) so as to satisfy the upload ratio requirement of many members only torrent communities