image rars
more complicated than necessary
what is an image rar
An image RAR is a RAR containing images/pictures. An image archive is an archive (such as RAR, ZIP, etc) containing images. However, this article also applies to archives containing any kind of non-compressable files.
A non compressable file is a file which does not become (significantly) smaller with more compression, regardless of the compression method used. The most common application of image archives is to store all pages of a manga/comic, or to store all photos of a set, in one file. Usually, image archives are released on internet for others to download/use. In practice, almost all image archives made by amateurs are in RAR format.
compression, theory
the compression used in archives is so called "lossless compression", which means decompression gives the exact original file back. This requires the compressed data to contain atleast the same amount of information as the original file, no information is lost. Lossy compression discards information from the original file, so the decompressed file can only be an approximation of the original.
One can say any file contains an amount of "information", or "entropy", and an amount of "air", or "redundancy" - any amount of data which is not necessary to represent the information. One can call the amount of information in the file, divided by it's size, it's "information density".
a graphical representation of compression
original, compressable, file
compressed with bad, lossless compression
compressed with good, lossless compression
compressed with lossy compression
lossy compressed file decompressed
solid archives
One reason why RAR and 7ZIP make smaller archives than ZIP is because they support solid archiving. A non solid archive compresses all files first, then adds them together.
A solid archive first adds all files together, then compresses them. so if information repeats in every file, it only has to appear once in the archive:
original file A
original file B
original file C
non solid archive:
common |
A |
|
common |
B |
|
common |
C |
|
solid archive:
compression in files other than archives
Many file formats used on the internet have their own compression methods, so the files can transfer quickly over slow connections,
cause less traffic, and take less space on a disk, while they can still conveniently be used without having to "unpack" them all the time. some examples of formats using lossy compression are: MP3, JPEG, OGG, MPG. some examples of formats using lossless compression are PNG, GIF, FLAC, and archives.
So such files are already compressed, so they won't become significantly smaller when trying to compress them again in an archive.
a mathematical approach to non-compressable files
To put it very simple, a file can be seen as a positive integer number. A lossless compression algorithm is a function which gives another, preferably smaller, number. This function has to be invertible - the decompression is the inverse - so it is injective: for every original file, there has to be atleast one compressed file.
Given a number, there is only a limited amount of smaller numbers - there are less smaller numbers than the value of this number. So not every number can be turned into a smaller number by using the compression function, most numbers will have to become larger numbers.
archives used as "resource"
If a program, such as a game, a web browser, or an image viewer, needs to load files on the fly (for example, textures, sounds, images, etc),
and this is not necessarily from real individual/local disk files, i like to refer to it as "resources". Examples:
Stepmania (a dance dance revolution simulator/game), can load it's data (music, graphics, etc) from either disk files, or from a zip file (with .smzip) extension.
Quake 3 Arena uses a data file with a .pk3 extension, it is in fact a zip archive.
Many games, maybe mostly older ones, loaded their data from custom archive formats made for that game. WAD for doom, PAK for quake, HOG for descent, GRP for build engine, etc.
Image viewers exist for viewing comic books/mangas, photo sets, etc, loading the images from a zip or rar archive on the fly, sometimes renamed to .cbz or .cbr.
Web browsers load web pages and images on the fly using the HTTP protocol.
Unreal Tournament can use the HTTP protocol to load needed maps and other data files when playing online.
Preferable properties of a data source for this kind of use is that it's easy to implement/program, and that one can obtain individual "files" randomly.
what is a ZIP archive?
i define a zip as a PKZIP 2.0 compatible, or "classic", zip archive, with only "deflate" (method 8) or "store" (method 0) compression. Practically all zip archives on internet are of this format, and this is the format which many archivers can open. I consider the newer, enhanced, and incompatible, PKZIP archives to not be zip files.
archive formats compared
What are desirable properties of an image archive format?
-
RAR, 7ZIP: good compression. This is definitely not interesting for image archives, because images don't compress, so they can't "compress better". in reality, an image rar is always slightly bigger than the same images as ZIP, probably because of more overhead of the RAR
-
RAR, 7ZIP: supporting archives bigger than 4 GB. ZIP does not support this. This is not really interesting for image archives, because practically all image archives on internet are small - 100 MB at most, usually less.
-
RAR, 7ZIP: filenames are stored as unicode. ZIP does not support this. This is not really interesting for image archives, because practically all existing image archives have filenames which are numbers, or a short fixed name + a number. Also it is recommended to release things without international characters, for compatibility.
-
RAR: password protection. The password protection encryption in ZIP is worthless, so if you need to password protect files, ZIP is a bad choice, and RAR is a good choice.
-
ZIP, 7ZIP: Completely open and free: source code is available for both compression and decompression of the archives.
How the compression works is well known. The ZIP format is unpatented and in the public domain. 7ZIP is LGPL.
There is no source code available to create RAR archives.
-
ZIP: Guaranteed non solid. If files can not be compressed, it is better to have an archive which does not support solid compression, because it means you're guaranteed to be able to extract individual files in any order, which is interesting for apps like image viewers, games, etc, which access the archive on the fly, decompressing files into ram when needed.
-
ZIP: Widest support in software. ZIP is easy to implement, simple, and old. Many archivers for other formats, and tools, have complete support for zip, such as winrar, 7zip. Also apps like windows XP's file browser, and windows commander, have the best archive support for ZIP.
Conclusion: complexity and advanced compression are not interesting for an image set, so being open, simple, compatible, and as widely usable as possible, are good criteria.
I think it is not necessary to have one archive format for all archives in the world, and multiple formats all have their purpose. I think ZIP is the best for image sets.
How to easily make image zips if you use winRAR: set the default archive format to ZIP. Now the rightclick menu will produce ZIP archives. You can set it back to RAR if you need to make RAR archives again.
external links
"why RAR sucks"
an image viewer. supports ZIP, but not RAR
back to main page