I use Zip Bombs to Protect my Server

Bots be warned

The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bots crawling content to power LLMs. But then there are the malicious bots. These are from spammers, content scrapers or hackers. At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server. It then turned the machine into a botnet used for DDOS. One of my first websites was yanked off of Google search entirely due to bots generating spam. At some point, I had to find a way to protect myself from these bots. That's when I started using zip bombs.

A zip bomb is a relatively small compressed file that can expand into a very large file that can overwhelm a machine.

A feature that was developed early on the web was compression with gzip. The Internet being slow and information being dense, the idea was to compress data as small as possible before transmitting it through the wire. So an 50 KB HTML file, composed of text, can be compressed to 10K, thus saving you 40KB in transmission. On dial up Internet, this meant downloading the page in 3 seconds instead of 12 seconds.

This same compression can be used to serve CSS, Javascript, or even images. Gzip is fast, simple and drastically improves the browsing experience. When a browser makes a web request, it includes the headers that signals the target server that it can support compression. And if the server also supports it, it will return a compressed version of the expected data.

Accept-Encoding: gzip, deflate

Bots that crawl the web also support this feature. Especially since their job is to ingest data from all over the web, they maximize their bandwidth by using compression. And we can take full advantage of this feature.

On this blog, I often get bots that scan for security vulnerabilities, which I ignore for the most part. But when I detect that they are either trying to inject malicious attacks, or are probing for a response, I return a 200 OK response, and serve them a gzip response. I vary from a 1MB to 10MB file which they are happy to ingest. For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.

Content-Encoding: gzip

What happens is, they receive the file, read the header that instructs them that it is a compressed file. So they try to decompress the 1MB file to find whatever content they are looking for. But the file expands, and expands, and expands, until they run out of memory and their server crashes. The 1MB file decompresses into a 1GB. This is more than enough to break most bots. However, for those pesky scripts that won't stop, I serve them the 10MB file. This one decompresses into 10GB and instantly kills the script.

Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device. Continue at your own risk. So here is how we create the zip bomb:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

Here is what the command does:

  1. dd: The dd command is used to copy or convert data.
  2. if: Input file, specifies /dev/zero a special file that produces an infinite stream of zero bytes.
  3. bs: block size, sets the block size to 1 gigabyte (1G), meaning dd will read and write data in chunks of 1 GB at a time.
  4. count=10: This tells dd to process 10 blocks, each 1 GB in size. So, this will generate 10 GB of zeroed data.

We then pass the output of the command to gzip which will compress the output into the file 10GB.gz. The resulting file is 10MB in this case.

On my server, I've added a middleware that checks if the current request is malicious or not. I have a list of black-listed ips that try to scan the whole website repeatedly. I have other heuristics in place to detect spammers. A lot of spammers attempt to spam a page, then come back to see if the spam has made it to the page. I use this pattern to detect them. It looks something like this:

if (ipIsBlackListed() || isMalicious()) {
    header("Content-Encoding: gzip");
    header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
    readfile(ZIP_BOMB_FILE_10G);
    exit;
}

That's all it takes. The only price I pay is that I'm serving a 10MB file now on some occasions. If I have an article going viral, I decrease it to the 1MB file, which is just as effective.

One more thing, a zip bomb is not foolproof. It can be easily detected and circumvented. You could partially read the content after all. But for unsophisticated bots that are blindly crawling the web disrupting servers, this is a good enough tool for protecting your server.

You can see it in action in this replay of my server logs.


Comments(22)

Andy Armstrong :

If you use a space character instead of zero it would look to the bot like a lot of leading whitespace rather than a zip bomb.

Also the gzip response can be a concatenation of multiple gzip streams - gzip decompressors generally handle that case transparently. That would mean you could pre-compress a few MB of whitespace and then send that repeatedly as a chunked response - infinite zip bomb :)

S. Hughes :

Aaaahhhh…. The 21st century internet incarnation of loading the .plan or .project file with a giant binary to crash curious “finger” users.

Cool solution!

Anon :

Another similar technique is billion laughs. Its a small XML file that recursively expands when parsed.

Some parsers are vulnerable to it, some catch it.

Marcos Alano :

I need to build a site just to do that. Thanks. :)

James :

Awesome way to rid your site of bots. I don't know why people don't respect the robots.txt file. That should turn them away like a "no soliciting" sign but it doesn't.

@Andy Armstrong had a great suggestion with the spaces instead of nulls.

Pat :

You can custom-craft even better zip bombs by making use of features of the zip compression algorithm. For example, something like https://www.bamsoftware.com/hacks/zipbomb/

Anonym :

Suspiciously similar article from 2017: https://blog.haschek.at/2017/how-to-defend-your-website-with-zip-bombs.html

Greg Searle :

On another note, I had to protect a client's IIS .net application from injection attempts. The server would detect the attempts, so the problem was traffic volume and not the attacks themselves. My response was... open the HTTP response, then sleep and terminate the response thread. This is illegal under the HTTP protocol as a full handshake is required, and left the attacker hanging there until it timed out.

Ibrahim Diallo author :

@greg the key is to stop abusers without exhausting your resources.

For example, since this article is getting so much attention, a lot of people are attempting to trigger the zip bombs. I turned it off since these are not real threats and I don't want my $6 server to suffer for it :D

Valérie M. :

Have you considered adding Brotli bombs to your arsenal?

Ibrahim author :

@Valerie the intention is to target abusive bots in an automated way. Bots won't even open a regular zip file that I send them randomly. So gzip is the best option simply because it's part of the http expected response and for the most part it would decompress automatically.

Kapil :

good to learn this

Nate :

This article totally nerd-sniped me ;)

The header "Content-Encoding: deflate, gzip" is equivalent to the two header combo "Content-Encoding: deflate", "Content-Encoding: gzip" (RFC 9110, section 5.3), which indicates that your data was compressed twice, first with zlib, then with gzip (RFC 9110, section 8.4). When a client fetches your zip bomb, it then decompresses the payload with gzip, and then (assuming it hasn't crashed) decompresses it again with deflate. Both of these encodings use the DEFLATE data format under the hood (RFC 1951), which can get a maximum of 1032x compression.

With two compression layers, you should be able to get 1032^2 = 1065024x compression. Since you're not using zlib, though, you only get one exponent of compression ratio.

I spent the weekend hacking this out, and came up with this:

github link

It's a 1 MiB zip bomb that double decompresses to over a terabyte.

There's also a possibility that the bots scraping your site are successfully decompressing the gzip layer, but breaking when trying to decompress the invalid zlib layer. This seems unlikely, but keep that in mind if you decide to use this.

Ibrahim author :

Thanks Nate, I learned something new. It might be time to update my zip bomb. I'm gonna spend some time unpack your code... not the gzip ;)

Murdoc :

Are you open to sharing the efficacy of the zip bomb approach? Even after looking at the log replay visualization it's not clear if it had any impact to the 10MB.zip recipient.

AFAIK as the server you have no control over what the client does with the file once it's downloaded. It's hard to envision a scraper bot being stupid enough to blindly unzip data into a self-destructive state.

Ibrahim Diallo author :

@Murdoc you are right, I have no control over the client. However, I can see the pattern of their requests in my own logs. For example, I can see a scanner run 200 hundred different request combinations daily. When I turn on zipbomb, the number of request drop to just one. Whether their system crashed or not is irrelevant to me, as long as I see it stop on my end.

I've tracked bots that only slow down when I serve them the 1MB.zip They will go from 100s of requests per minutes to just a dozen. Meaning they are ingesting the content. When I serve the 10MB one, I never hear from them again. Mission accomplished!?

So in that sense, it works. My goal is not to serve malicious files, it's to stop those who abuse the server. For example, GET requests are safe. There is no reason for someone to use a POST request, other than sign up for the newsletter or post a comment. Even this comment form leads to a honeypot by default. Even a humble blog like this one can get an overwhelming number of requests from bots.

Jan :

Not sure if the script of Bots can allocate a fixed buffer before uncompressing the file then the bomb won't crash them but being caught.

TransparentLC :

What about brotli bombs with "Content-Encoding: br"?

Ibrahim author :

@TransparentLC The goal is to provide an encoding that bots will most likely decode. I use gzip because most linux commands (curl, wget) or simple script use a method that will auto decompress. But it doesn't hurt to try out multiple to see what works for you.

Matthew Skala :

I think "Content-Encoding: deflate, gzip" is not really what you intend here. Listing multiple compression methods in this header is supposed to mean that they will all be applied in the order listed, so this would only be correct for data that was first subjected to deflate and then gzipped. It does not work the same way as the "Accept-Encoding" header, where the client specifies options with the intention that the server will choose one.

Ibrahim Diallo author :

Thank you so much for finding this mistake @Matthew. I've fixed it. 👍

xiyu :

I'm glad to see your sharing. My website deployed on CloudFlare has been under constant attack. How should I use your zip bomb for the CloudFlare website? Thank you very much

Let's hear your thoughts

For my eyes only