Suboptimal compression with .Net’s GZipStream

I played around a bit with one of the .Net framework’s compression classes (System.IO.Compression.GZipStream) recently and got some unexpected results. First, the module is a bit weird in how in takes in and outputs data (e.g. for decompression you read directly from the GZipStream, but for compression you read from a MemoryStream), but once you get past this you run into a more significant issue: the compression library is not optimal and, in certain cases, the size of the “compressed” data can be significantly larger than that of the uncompressed input. I did a few ad-hoc and very unscientific tests, but they were good enough to give an indication that there was a problem. GZipStream is probably still good enough when there’s highly compressible data, but for situations where you don’t have any idea of what’s being compressed it’s probably best to look for another solution.

There’s a thread on MSDN about this issue (System.IO.Compression not as good as compressed folder) and also a feedback entry on MS Connect.

Leave a Reply