DCSIMG
Back to Basics – Zip in .NET - Gil Fink's Blog

Gil Fink's Blog

Fink about IT

News

Microsoft MVP

My Facebook Profile My Twitter Profile My Linkedin Profile

Locations of visitors to this page

Creative Commons License

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.
© Copyright 2013 Gil Fink

Hebrew Articles

Index Pages

My OSS Projects

English Articles

Back to Basics – Zip in .NET

Back to Basics – Zip in .NET

In today’s post I’m Back to Basics – Zip in .NET
going to share a
problem I solved this
week. The solution was
to use the framework’s
System.IO.Compression namespace and the GZipStream object.

The Problem

In my current project we save Xml data in our database. The field to
save the data was of type varchar(6000). This way of saving the data raised
a problem of big Xml data (over 8000 kb for every Xml data)
which were saved in the database and for the long run could raise
space and performance problems.

The Solution

Use the compression abilities of .NET framework, compress the
Xml data and save the data in a binary form. We needed to change the
field type in the database to binary type and compress the Xml data
before inserting it to the database. After the binary data was retrieved
from the database a reverse process of decompress returns the
original Xml string.

The Code

I first built a console application to write the code and test it.
Then, I wired the zip and unzip methods I wrote to the part that needed
the compression. The following code is the console application’s
zip and unzip methods I used to compress the Xml data.

 
        static void Main(string[] args) 
        { 
            string data = "<Root><Child></Child>data1<Child>data2</Child><Child>data3</Child><Child>data4</Child><Child>data5</Child></Root>"; 
            Console.WriteLine(data); 
 
            byte[] zipped = ZipDocumentData(data); 
            Console.WriteLine(Encoding.UTF8.GetString(zipped)); 
 
            data = UnZipDocumentData(zipped); 
            Console.WriteLine(data); 
 
            Console.Read(); 
        } 
 
        private static byte[] ZipDocumentData(string documentData) 
        { 
            byte[] byteArray = Encoding.UTF8.GetBytes(documentData); 
            string result = string.Empty; 
 
            using (MemoryStream ms = new MemoryStream()) 
            { 
                using (GZipStream stream = new GZipStream(ms, CompressionMode.Compress)) 
                { 
                    //Compress 
                    stream.Write(byteArray, 0, byteArray.Length); 
                } 
                return ms.ToArray(); 
            } 
        } 
 
        private static string UnZipDocumentData(byte[] zippedDocumentData) 
        {             
            string result = string.Empty; 
 
            //Prepare for decompress 
            using (MemoryStream ms = new MemoryStream(zippedDocumentData)) 
            { 
                using (GZipStream stream = new GZipStream(ms, CompressionMode.Decompress)) 
                { 
                    //Reset variable to collect uncompressed result 
                    byte[] byteArray = new byte[4096]; 
 
                    //Decompress 
                    int rByte = stream.Read(byteArray, 0, byteArray.Length); 
 
                    result = Encoding.UTF8.GetString(byteArray); 
                } 
            } 
            return result; 
        } 

Some things that should be concerned if you are going to use this code:

  • The encoding of the strings I use are in UTF8 format. If you use other
    formats you should change the Encoding.UTF8 code to the format you
    use.
  • In the decompress process I use a fixed array of 4096 bytes. This is only
    for the testing application in the real method I save the original size of
    the array.

Summary

Lets sum up, I used a compression method to compress Xml data in the
database. I showed the code to do that using the GZipStream object which
is part of System.IO.Compression namespace. I hope the code will help you
when you’ll ever need to compress data.

DotNetKicks Image

Comments

Rotem Bloom said:

Thanks 4 sharing great post!

# February 5, 2009 10:34 PM

Gil Fink said:

Thanks Rotem.

# February 6, 2009 9:05 AM

Reflective Perspective - Chris Alcock » The Morning Brew #281 said:

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #281

# February 6, 2009 10:32 AM

Christian Mogensen said:

GZip'ing text in and out of the database is smart, as long as you don't need to search it.

Your code is not named correctly though.

You're not using ZIP. You're using GZip.

ZIP is a archive format that uses the Inflate/Deflate compression algorithms.

GZip is a compression algorithm.

Info-ZIP maintains the C/C++ reference libraries for accessing ZIP files.

The C# version is available from www.icsharpcode.net/.../SharpZipLib

The method names you want are

GZipDocumentData and UnGZipDocumentData.

# February 6, 2009 3:37 PM

Gil Fink said:

Thanks for the correction Christian Mogensen.

# February 6, 2009 4:00 PM

ASP.NET said:

Back to Basics – Zip in .NET - Gil Fink on .Net

# February 7, 2009 8:17 PM

Cheeso said:

as Christian said, your code is compressing, but not zippping.  The headline of this post is misleading.

Also, why would you (de-)compress only 4k?  wouldn't you want to (de-)compress in a loop, until you've processed the entire file or stream?  

If you want a full zip library, check out DotNetZip.

www.codeplex.com/DotNetZip

# February 13, 2009 7:00 PM

Gil Fink said:

@Cheeso,

Thanks for your comment. As I wrote in the post, the code that is shown is the testing application (this is why I wrote the "things that should be concerned"  section). The real production code isn't compressing only 4k but every file size. For the second thing you wrote - "wouldn't you want to (de-)compress in a loop, until you've processed the entire file or stream?" - sometimes its true that looping and compressing is a better method. In the example I wrote because the Xml data I compress is small, I read the string to the end and not looping.

Thank for the link of the zip library that you wrote.

# February 15, 2009 7:57 AM

Update - Saving Blob more than 32K using gzip « Dotnetinfo’s Weblog said:

Pingback from  Update - Saving Blob more than 32K using gzip &laquo; Dotnetinfo&#8217;s Weblog

# March 12, 2009 10:45 PM