Today I set up a little project website on a new subdomain. It’s not a www subdomain or a newly registered domain, which is easy to detect. We’re talking about:

Randomchars.mydomain.com

Within 20 minutes, the anthropic ClaudeBot was on it. I could tell because the nginx access log showed a hit to robots.txt and then a handful of pages.

First off, how the hell did they find it? Next, is my DNS provider, Amazon Route 53 selling this kind of data now? Or is there some kind of DNS wildcard query?

  • Taldan@lemmy.world
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    4 days ago

    I’m a bit confused about your DNS config. DNS is generally public, that’s the point of it

    AI scrapers, like most scrapers, just crawls every new DNS entry that is created

    • cron@feddit.org
      link
      fedilink
      arrow-up
      6
      ·
      4 days ago

      How exactly can you find all subdomains of a given domain?

      Sure, it is possible with misconfigured DNSSEC (zone walking), but otherwise I‘d say it is not possible.

      • ramble81@lemmy.zip
        link
        fedilink
        arrow-up
        5
        ·
        4 days ago

        I could see accidentally having XFER enabled (not even DNSSEC related) and they transfer your entire zone.

        • cron@feddit.org
          link
          fedilink
          arrow-up
          3
          ·
          4 days ago

          Thats possible, though probably not the most likely option (misconfigured webserver, certificate transparency logs).