I usually assumed that tar was a compression utility, but I am unsure, does it actually compress files, or is it just like an ISO file, a file to hold files?
- 965
-
Also on SuperUser. – allquixotic Apr 30 '14 at 17:38
4 Answers
Tar is an archiving tool (Tape ARchive), it only collects files and their metadata together and produces one file. If you want to compress that file later you can use gzip/bzip2/xz. For convenience, tar provides arguments to compress the archive automatically for you. Checkout the tar man page for more details.
-
9A slight clarification on the answer. It is GNU tar that provides those extra compression arguments. For example, Solaris tar does not provide arguments for compression. – Tero Kilkanen Apr 29 '14 at 22:20
-
5
-
BSD tar provides an argument for compression as well, though it only accepts
zand determines the compression method based on the extension, whereas GNU tar has separatezZjJarguments for the different compression methods. – wingedsubmariner Apr 30 '14 at 00:59 -
@wingedsubmariner The BSD tar manpage doesn't say it supports -j, but it (at least on mac) does. – Kevin Apr 30 '14 at 01:13
-
@wingedsubmariner: I don't know if the BSD tar on Mac is modified by Apple or not, but it supports
zZjJas well. Even though the man page does not mention the-Jflag, it actually accepts-Jand outputs anxzfile. – Siyuan Ren Apr 30 '14 at 02:58 -
2Just read the BSD tar manpage, and it turns out I was mistaken, BSD tar uses separate
zZjJfor compression just like GNU tar. However, it does automatically detect compression when decompressing though, whereas GNU tar expectszZjJthen also. – wingedsubmariner Apr 30 '14 at 03:10 -
5@wingedsubmariner: no; modern-ish versions of GNU
tardecompress automatically without requiring the-zZjJoptions. – Jonathan Leffler Apr 30 '14 at 04:02 -
-
@staticx: Which version of GNU
tarare you running, and on which platform? – Jonathan Leffler Apr 30 '14 at 16:04 -
@JonathanLeffler: RHEL 5. tar (GNU tar) 1.23 Copyright (C) 2010 Free Software Foundation, Inc. – Engineer2021 Apr 30 '14 at 16:05
-
@JonathanLeffler: I did
tar cvfz test.tar.gz test.c ; tar xvf test.tar.gzand got test.c back – Engineer2021 Apr 30 '14 at 16:07 -
@staticx: curious! GNU
tar1.26 on Ubuntu 12.04 doesn't, but I'm tolerably certain I'd have to go back further than 2010 to find a version that doesn't decompress at least some file types automatically. Thegzipautomatic decompression has been around a long time, AFAICR (meaning, mostly, I don't remember when it was added, but it was quite a long time ago). Periodically, new compression formats were released (.bz2,.lz,.xz,.7z) and for a while I needed to holdtar's hand with--use-compress-program=whateveras an option. The set of compression formats evolves, therefore. – Jonathan Leffler Apr 30 '14 at 16:11 -
@staticx: OK; that's consistent with 'decompresses automatically'. You do have to tell it which 'compress' to use (either by flag or possibly by file extension); that won't change. – Jonathan Leffler Apr 30 '14 at 16:12
-
@JonathanLeffler: Yes, sorry I may have misconstrued your sentence. I thought you were implying that you had to use
xvfzwhen in fact it will detect the file extension and try that. – Engineer2021 Apr 30 '14 at 16:12 -
@JonathanLeffler: This also works:
tar cvfz test.tar ; tar xvf test.tar. – Engineer2021 Apr 30 '14 at 16:19 -
@staticx: as a point of detail, it works by content rather than extension (or as well as extension). Try:
tar -czf /tmp/junk.tar.bz2 *.*, thenfile /tmp/junk.tar.bz2, andtar -tvf /tmp/junk.tar.bz2. – Jonathan Leffler Apr 30 '14 at 16:19 -
@JonathanLeffler: Right, I figured there is a header that it reads to determine the type since relying on the
.gz,.bz2, etc is unreliable. So it will decompress automatically – Engineer2021 Apr 30 '14 at 16:20
tar produces archives; compression is a separate functionality. However tar alone can reduce space usage when used on a large number of small files that are smaller than the filesystem's cluster size. If a filesystem uses 1kb clusters, even a file that contains a single byte will consume 1kb (plus an inode). A tar archive does not have this overhead.
BTW, an ISO file is not really "a file to hold files" - it's actually an image of an entire filesystem (one originally designed to be used on CDs) and thus its structure is considerably more complex.
-
3
-
@psusi so for a file of bytes 1-1023 will consume 1024 always which results in wastage of 1023-1 bytes. – Shiplu Mokaddim May 14 '19 at 13:36
-
tarhas significant alignment / block size overhead, due to its origin as a Tape Archiver. Ifais an empty file,tar -cf a.tar awill create a 10240-byte filea.tar. You can use a hex editor orodto verify that most of the file is NUL (zero) bytes. – Clement Cherlin Sep 12 '22 at 15:59
The original UNIX tar command did not compress archives. As was mentioned in a comment, Solaris tar doesn't compress. Nor does HP-UX, nor AIX, FWIW. By convention, uncompressed archives end in .tar.
With GNU/Linux you get GNU tar. (You can install GNU tar on other UNIX systems.) By default it does not compress; however, it does compress the resulting archive with gzip (also by GNU) if you supply -z. The conventional suffix for gzipped files is .gz, so you'll often see tarballs (slang for a tar archive, usually implying it's been compressed) that end in .tar.gz. That ending implies tar was run, followed by gzip, e.g. tar cf - .|gzip -9v > archive.tar.gz. You'll also find archives ending in .tgz, e.g. tar czf archive.tgz ..
Edit: www.linfo.org/tar.html reminded me that GNU tar supports much more functionality than merely compressing with gzip, and it reminded me that the suffixes are more than plain conventions. They have built-in semantics. It also supports bzip2 (-j for .bz2) and old compress (-Z for .Z). Then I looked at the man page and was reminded that -a automatically maps your desired compression method based on suffix.
One other nit. As the Linux tar man page says, GNU produces info pages, not man pages, so to learn all about GNU tar, run info tar.
- 211
-
The GNU tar still doesn't handle compressions by itself, it just pipes to/from gzip, bzip2, compress and others. – ott-- Aug 06 '15 at 20:03
-
I had a look at the source. GNU tar handles compression! The implementation takes advantage of code reuse and sound UNIX user space architectural principles. "Just pipes" is understating the way compression is tightly integrated into the tool. The fact that it happens to fork helper programs is a technicality. If you want to defend "just pipes," then cite file names and line numbers and let's see which side the community takes. – tbc0 Aug 06 '15 at 21:15
-