Zip Archives Become a First class citizen in .NET 4.5

Compression in the .NET framework has been supported via different libraries in the past (via Open File Conventions) but the support for .zip archives hasn’t quite been complete. With .NET 4.5 we get a dedicated zip compression library that allows us to manipulate zip libraries fully.

Introduction

Up until now, compression in .NET was supported only to the extent of supporting Open File Convention and centered on need to adhere to the convention. As a result the archives created we never fully compliant with the zip archive specs. Introducing the System.IO.Compression.ZipArchive type that now covers all our archive compression and decompression needs. In this post we will see the available features and how we can use them to create various types of archiving solutions.

Features of System.IO.Compression.ZipArchive Type

Single step extraction of existing zip libraries
A zip archive can be deflated in a single step as follows

ZipFile.ExtractToDirectory(@”D:\devcurry.zip”, @”D:\devcurry\”);

This above code extracts the dnc.zip file into the D:\dnc folder.
Single step compression of entire folder
A zip archive can be created from a folder in a single step

ZipFile.CreateFromDirectory(@”D:\devcurry”, @”D:\devcurry.zip”);

This compresses the entire contents of devcurry folder into devcurry.zip
Selected compression of a list of files
Single step compression is fine but often we need to be able to create an archive of a set of files in a particular folder. For example, if you blog often with code samples you need your source code packaged without the bin and obj folders as well as exclude the *.user and *.suo files to prevent user cache information from being distributed. The zip library in .NET 4.5 allows us to create a zip archive of selected files as well.

As we will see in the example below it is a very easy to use and powerful library.
Streaming Access to compressed files
Large zip libraries become a limitation in some archiving tools because attempt to open a big file chokes on lack of system memory and crashes. The Zip library in .NET 4.5 provides streaming access to the compressed files and hence the archive need not be loaded into memory before an operation. For example

using (ZipArchive zipArchive = 
  ZipFile.Open(@"C:\Archive.zip", ZipArchiveMode.Read))
{
  foreach (ZipArchiveEntry entry in zipArchive.Entries)
  {
    using (Stream stream = entry.Open())
    {
      //Do something with the stream
    }
  }     
}

A typical use for this could be while building a web server where you could zip and unzip data on the fly. Another use could be collecting data from Internet streams like Twitter or Github statuses and compressing them directly into an archived file.

Example: Compress a Visual Studio Solution file without binaries and personalization information

Visual Studio 2010 has a nice extension called SolZip. What it does is, it adds right click menu to Visual Studio and on right-clicking the Solution in Solution Explorer is creates a zip file without the bin/obj folders. It also excludes the personalization info in the .user and the .suo file. If you bundle these files along with your project and give it to someone they may just end up with conflicts or folder reference issues.
Coming back to the topic, while doing my previous article on CallerInfoAttributes I was in Visual Studio 2012 and went looking for SolZip. I couldn’t find it. Now while writing this article I realized I could create a command line version of it and demonstrate the power of the Zip library in .NET 4.5

Step 1: Create a new Console Application Project in VS 2012

Step 2: Add Reference to the System.IO.Compression and System.IO.FileSystem

add-reference-to-system-io-compression

Step 3: Setup defaults.

We setup the default parameters for the zip file to exclude the files we don’t want by specifying the extensions and/or the folder names that need to be excluded. Note the folders start with a ‘\’.

initial-setup

CreateArchive is the function that actually does the Archiving and returns number of files archived. Once returned we show the number of files returned and wait for the users to hit enter, when we quit the app.

By default if no parameters are specified, this utility will try to zip the contents of it’s current folder while excluding the files with ‘.user’ or ‘.suo’ extension or files in the either of the bin, obj or packages folders.

Step 4a: Filtering and archiving the files

The CreateArchive method takes in the root folder name from where the archiving is supposed to start, the list of extensions and folders that should be excluded and the name of the final archive.

The Excluded method takes the current file in the list of all files being enumerated, and check if it is in a folder that’s excluded or if it has an extension that’s excluded.

create-exclude-method-signatures

Step 4b: The CreateArchive method

The CreateArchive method takes the source folder path and the provided archive name and checks if the archive already exists. If it does it asks the user if it should be overwritten. If user selects y, then the file is overwritten else the process is aborted.

After confirmation, Directory.EnumerateFiles(…) enumerator opens an enumeration over all files in the source folder including sub-folders. The enumerator returns each file’s relative-path. Once it is determined by the Excluded method that the file has not been excluded we add it to the archive.

create-method

Syntax for opening the Archive and adding the file is highlighted. Couple of notes

The CreateEntryFromFile(…, …) method takes two parameters, first one is the source file path, the second one is the path starting from the folder where the archive is located. For example, Let’s assume we have the following structure.

add-file-parameter

We want all contents of the highlighted ‘SolutionZip’ folder in an archive called ‘Archive.zip’. Now as we loop through all the files and come to the AssemblyInfo.cs in the (highlighted) Properties folder. To add this file to the zip value of

file = .\SolutionZip\Properties\AssemblyInfo.cs
addFile = SolutionZip\Properties\AssemblyInfo.cs

Point to note is the value for addFile HAS to start at the folder in which the zip file is, without that the zip file is unable to show the sub-folder structure in Explorer (even though the zip file is valid).

Step 4c: The Excluded method

- The excluded method first creates a collection of folders in the exception list.

- Next it checks if the file’s extension is in the exceptions list. If present it returns true meaning the current file is excluded.

- If it doesn’t find the extension in the excluded list it goes ahead and loops through the folderNames and check if the current file is in the excluded folder or any of it’s subfolders. If yes, it returns true to exclude the file, else returns false.

exclude-method

That’s it, we have a handy little utility to zip up our solution files without any personalization information. Syntax for it is

C:\> SolutionZip.exe mySolution\ solutionName.zip

Conclusion

With .NET Framework 4.5 we have a powerful and robust Zip archiving utility that can be used in our applications so we don’t need to rely on any third party zip providers.

You can Fork the code on Github or Download the source code here




3 comments:

Mira said...

Thank You for your article. Is it also possible to compress stream?

Anonymous said...

Hello Mira,
Yes, you can compress streams. In fact using the library to host server applications that stream data over pipes is one of the target areas for the library.
Thanks and Regards,
Sumit.

Anonymous said...

Is there an example of zipping streams in to a .NET ZipArchive object?