Back to Basics – Zip in .NET

February 5, 2009

6 comments

Back to Basics – Zip in .NET

In today’s post I’m Back to Basics – Zip in .NET
going to share a
problem I solved this
week. The solution was
to use the framework’s
System.IO.Compression namespace and the GZipStream object.

The Problem

In my current project we save Xml data in our database. The field to
save the data was of type varchar(6000). This way of saving the data raised
a problem of big Xml data (over 8000 kb for every Xml data)
which were saved in the database and for the long run could raise
space and performance problems.

The Solution

Use the compression abilities of .NET framework, compress the
Xml data and save the data in a binary form. We needed to change the
field type in the database to binary type and compress the Xml data
before inserting it to the database. After the binary data was retrieved
from the database a reverse process of decompress returns the
original Xml string.

The Code

I first built a console application to write the code and test it.
Then, I wired the zip and unzip methods I wrote to the part that needed
the compression. The following code is the console application’s
zip and unzip methods I used to compress the Xml data.

 
        static void Main(string[] args) 
        { 
            string data = "<Root><Child></Child>data1<Child>data2</Child><Child>data3</Child><Child>data4</Child><Child>data5</Child></Root>"; 
            Console.WriteLine(data); 
 
            byte[] zipped = ZipDocumentData(data); 
            Console.WriteLine(Encoding.UTF8.GetString(zipped)); 
 
            data = UnZipDocumentData(zipped); 
            Console.WriteLine(data); 
 
            Console.Read(); 
        } 
 
        private static byte[] ZipDocumentData(string documentData) 
        { 
            byte[] byteArray = Encoding.UTF8.GetBytes(documentData); 
            string result = string.Empty; 
 
            using (MemoryStream ms = new MemoryStream()) 
            { 
                using (GZipStream stream = new GZipStream(ms, CompressionMode.Compress)) 
                { 
                    //Compress 
                    stream.Write(byteArray, 0, byteArray.Length); 
                } 
                return ms.ToArray(); 
            } 
        } 
 
        private static string UnZipDocumentData(byte[] zippedDocumentData) 
        {             
            string result = string.Empty; 
 
            //Prepare for decompress 
            using (MemoryStream ms = new MemoryStream(zippedDocumentData)) 
            { 
                using (GZipStream stream = new GZipStream(ms, CompressionMode.Decompress)) 
                { 
                    //Reset variable to collect uncompressed result 
                    byte[] byteArray = new byte[4096]; 
 
                    //Decompress 
                    int rByte = stream.Read(byteArray, 0, byteArray.Length); 
 
                    result = Encoding.UTF8.GetString(byteArray); 
                } 
            } 
            return result; 
        } 

Some things that should be concerned if you are going to use this code:

  • The encoding of the strings I use are in UTF8 format. If you use other

    formats you should change the Encoding.UTF8 code to the format you

    use.

  • In the decompress process I use a fixed array of 4096 bytes. This is only

    for the testing application in the real method I save the original size of

    the array.

Summary

Lets sum up, I used a compression method to compress Xml data in the

database. I showed the code to do that using the GZipStream object which

is part of System.IO.Compression namespace. I hope the code will help you

when you’ll ever need to compress data.

DotNetKicks Image
Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

6 comments

  1. Rotem BloomFebruary 5, 2009 ב 22:34

    Thanks 4 sharing great post!

    Reply
  2. Christian MogensenFebruary 6, 2009 ב 15:37

    GZip’ing text in and out of the database is smart, as long as you don’t need to search it.

    Your code is not named correctly though.

    You’re not using ZIP. You’re using GZip.
    ZIP is a archive format that uses the Inflate/Deflate compression algorithms.
    GZip is a compression algorithm.

    Info-ZIP maintains the C/C++ reference libraries for accessing ZIP files.

    The C# version is available from http://www.icsharpcode.net/OpenSource/SharpZipLib/

    The method names you want are
    GZipDocumentData and UnGZipDocumentData.

    Reply
  3. Gil FinkFebruary 6, 2009 ב 16:00

    Thanks for the correction Christian Mogensen.

    Reply
  4. CheesoFebruary 13, 2009 ב 19:00

    as Christian said, your code is compressing, but not zippping. The headline of this post is misleading.

    Also, why would you (de-)compress only 4k? wouldn’t you want to (de-)compress in a loop, until you’ve processed the entire file or stream?

    If you want a full zip library, check out DotNetZip.
    http://www.codeplex.com/DotNetZip

    Reply
  5. Gil FinkFebruary 15, 2009 ב 7:57

    @Cheeso,
    Thanks for your comment. As I wrote in the post, the code that is shown is the testing application (this is why I wrote the “things that should be concerned” section). The real production code isn’t compressing only 4k but every file size. For the second thing you wrote – “wouldn’t you want to (de-)compress in a loop, until you’ve processed the entire file or stream?” – sometimes its true that looping and compressing is a better method. In the example I wrote because the Xml data I compress is small, I read the string to the end and not looping.
    Thank for the link of the zip library that you wrote.

    Reply