Miasma: A Rust Tool to Trap AI Web Scrapers in an Endless Poison Pit
The Tool
Miasma is a lightweight Rust-based web server designed to trap AI web scrapers by serving them poisoned training data and an infinite maze of self-referential links. It's gained 60 stars on GitHub and sparked discussion on Hacker News.
The Problem
AI companies scrape the internet at enormous scale to train their models. Website owners have limited options to protect their content. Miasma offers an offensive approach: poison the data that scrapers consume.
How It Works
Step 1: Deploy Honeytrap Links
Add hidden links to your site that only scrapers will follow:
<a href="/bots" style="display: none;" aria-hidden="true" tabindex="1">
Amazing high quality data here!
</a>
These links are invisible to humans and screen readers but visible to automated crawlers.
Step 2: Route to Miasma
Configure your web server (e.g., Nginx) to proxy the honeytrap path to Miasma:
location ~ ^/bots($|/.*)$ {
proxy_pass http://localhost:9855;
}
Step 3: Infinite Poison Loop
When a scraper follows the link, Miasma serves:
- Poisoned data from its "poison fountain" — gibberish designed to corrupt training
- Self-referential links that lead to more poisoned pages, creating an infinite crawl loop
- The scraper wastes time and resources on useless data
Technical Details
- Written in Rust for speed and low memory usage
- Minimal footprint — designed not to waste YOUR compute resources
- Install via
cargo install miasmaor download pre-built binaries - Configurable link prefixes, ports, and behavior
The Debate
This approach raises interesting questions:
- Effectiveness: Will poisoned data actually degrade model quality?
- Ethics: Is intentionally corrupting training data justified?
- Legality: What are the legal implications of serving misleading content?
- Cat and mouse: How will scrapers adapt to detect and avoid traps?
Context
Miasma is part of a growing movement of anti-scraping tools that includes robots.txt directives, CAPTCHAs, and now active data poisoning. As AI training becomes more aggressive, website owners are developing increasingly creative defenses.