The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

  • 👍Maximum Derek👍@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

      • 👍Maximum Derek👍@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).