ZIP (file format)
ZIP File extension: .zip
MIME type: application/zip
Type code: com.pkware.zip-archive
Magic: PK\003\004
Developed by: Phil Katz
Type of format: Data compression
The ZIP file format is a popular data compression and archival format. A ZIP file contains one or more files that have been compressed or stored.
The format was originally designed by Phil Katz for PKZIP. However, many software utilities other than PKZIP itself are now available to create, modify or open ZIP files, notably WinZip, BOMArchiveHelper, PicoZip, Info-ZIP, WinRAR, IZArc and 7-Zip. Microsoft has also included minimal built-in ZIP support (under the name "compressed folders") in later versions of its Windows operating system. Apple also included built-in ZIP support in Mac OS X v10.3 and later.
ZIP files generally use the file extensions ".zip" or ".ZIP" and the MIME media type application/zip. Some software uses the ZIP file format as a wrapper for a large number of small items in a specific structure. Generally when this is done a different file extension is used. Examples of this usage are Java JAR files, id Software .pk3/.pk4 files, package files for StepMania and Winamp/Windows Media Player skins, XPInstall, and some OpenOffice.org document formats. The OpenDocument format usually uses the JAR file format internally, so it can be easily uncompressed and compressed using tools for ZIP files.
History
Early history
The ZIP file format was originally created by Phil Katz, founder of PKWARE, after a prolonged legal dispute between PKWare and System Enhancement Associates (SEA) over the trademark name "ARC" (short for "Archive") and the file name extension .arc
PKWare's first archive product, PKARC, borrowed heavily from SEA's published code, and improved on it by converting SEA's ARC C code into hand optimised assembler, which was much faster. PKARC also used the ".ARC" file name extension. SEA contended that Katz had based his product on their code and trademark name, and thus ought to license the code from them and pay royalties. PKWare refused. SEA brought a successful copyright infringement lawsuit against Phil Katz and PKWare. After suit was brought, Katz briefly released a relabeled version of PKARC named PKPAK in a futile effort to invalidate the suit.
During settlement, Katz still refused to pay license fees to SEA, instead agreeing to pay SEA's legal fees and stop selling PKARC. He then went on to create his own file format, which is known worldwide now as the ZIP format (commonly called a ZIP File). The ZIP format he designed was more resistant to data loss than the ARC format because of redundant catalog storage; it also was more flexible than ARC, providing room for additional optional compression algorithms and room for future expansion. Along with the new format, PKZIP included at least one more efficient compression algorithm than any supported by ARC. Once the PKZIP software was released, many users abandoned ARC because of its slower speed and less effective compression performance, and because Katz had successfully put forth the idea that he was the "good guy" who was being unfairly treated by an evil corporation.
Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989.
The name zip (meaning speed) was suggested by Katz's friend Robert Mahoney. They wanted to imply that their product would be faster than ARC and other compression formats of the time.
Moving beyond the command line
In the mid 1990s, as more new computers included graphical user interfaces, there were more users who were not comfortable with the command-line operation of PKZIP. Seeing an opportunity, shareware authors began pitching compression and archival programs with graphical user interfaces. Many of these used the ZIP format. WinZip was among the most popular. PKWare (Katz's company) also offered a graphical version of PKZip. These graphical compression programs were easier to learn to use than the older command-line equivalents, but they still required learning an additional program and an additional interface just for compression.
An open source implementation of Phil Katz's "deflate" and "inflate" routines was released. The free code released by the Info-ZIP project under a BSD license spawned a horde of PKZIP imitators (WinZip, PicoZip, PowerArchiver, Turbozip, PowerZip and many more), establishing the PKZIP file format as a de facto industry standard.
The first version of what would become Info-ZIP was published by Samuel Smith in March 1989, complete with the source code in both Pascal and C forms. In September he released 2.0, including support for the new "implode" method that had been added to PKZIP 1.01. A port to Unix was released by Carl Mascott and John Cowan in December.
In March 1990 a number of interested parties set up a mail list on a disused DEC-20 mainframe at the White Sands Missile Range, agreeing to form a group to clean up the code and make it officially public. In May the first version of this code was released, as Info-ZIP 3.0.
In 1994 and 1995 Info-ZIP turned a corner, and effectively became the de facto ZIP program. A huge number of ports were released that year, including numerous minicomputers, mainframes and practically every microcomputer ever developed. All the while the software had continued to add support for newer compression systems being added to PKZIP, eventually this happened so quickly that there was no reason to use PKZIP. It was also in 1995 that the principal maintainers start to work heavily on the PNG format, and changes to Info-ZIP slowed.
In the late 1990s, various file manager software products started integrating support for the ZIP format into the file manager user interface. Even before that, Norton Commander and clones like Volkov Commander in DOS started that trend, and that remains the norm for the "Commander-like" or Orthodox file managers like Midnight Commander (Linux and UNIX like systems) and Total Commander, previously Windows Commander (Windows). The KDE file manager (kfm) supported this very early, and support was also added to Windows Explorer first with Plus! for Windows 98 and later included with Windows Me and Windows XP, the Mac OS Finder (as of Mac OS X, via the BOMArchiveHelper utility), the Nautilus file manager used with GNOME, the Konqueror file manager used with newer versions of KDE, and others. By 2002, all major desktop environments included ZIP file support in their file managers. Typically, in any modern file manager, a ZIP file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder; the compression is handled in a way that is largely transparent to the end user. This eliminates the need for the user to learn to use a program and an interface just for the purpose of compression and archival, since the same interface can be used as for regular file management.
Technical information
ZIP is a fairly simple archive format that compresses every file separately. Compressing files separately allows for individual files to be retrieved without reading through other data; in theory, it may allow better compression by using different algorithms for different files. However a caveat to this is that archives containing a large number of small files end up significantly larger than if they were compressed as a single file (the classic example of the latter is the common tar.gz archive which consists of a TAR archive compressed using gzip).
The specification for ZIP indicates that files can be stored either uncompressed or using a variety of compression algorithms. However, in practice, ZIP is almost always used with Katz's DEFLATE algorithm, except when files being added are already compressed or are resistant to compression.
ZIP supports a simple password based symmetric encryption system which is known to be seriously flawed. In particular it is vulnerable to known-plaintext attacks which are in some cases made worse by poor implementations of random number generators[1]. It also supports spreading archives across multiple removable disks (generally floppy disks, but it could also be used with other removable media).
New features including new compression and encryption methods have been added to ZIP in more recent times, but these are not supported by many tools and are not in wide use.
Most zip programs support at most 4GB files; though various vendors have "64-bit extended format"s to store larger files. It's not clear if the various vendors use the same formats for large files.
The FAT filesystem of DOS only has a granularity of two seconds; the Zip file records mimic this. As a result, the granularity of files in a Zip archive is only two seconds.
The Info-ZIP implementations of the Zip format adds support for Unix filesystem features, such as user and group IDs, file permissions, and support for symbolic links. The Apache Ant implementation is aware of them to the extent that it can create files with predefined Unix permissions.
The Info-ZIP Windows tools also support NTFS filesystem permissions, and will make an attempt to translate from NTFS permissions to Unix permissions or vice-versa when extracting files. This is sometimes annoying, and can result in undesireable combinations, e.g. .exe files being created on NTFS volumes with executable permission denied.