0

Every file is written as bits & bits & bits &....so on.

Also, every file has 2 types of metadata:

  1. OS metadata: file location, owner, permissions etc. which are written in the OS itself(for example: in the inode for linux)
  2. Non-OS metadata: copyright information, color depth, image resolution, shutter speed etc. which are written in the file itself.

My question is that what are the relative positions of the actual file data and the written-into-the file metadata in the "bit-sequence" of the entire file? Does the metadata occur first or the actual file content, in the "bit sequence"?

BE ADVISED: If the answer is heavily dependent on the file type, then please prefer the .mp3 filetype.

user
  • 103
  • Your basic assumption "every file has 2 types of metadata" is wrong. The OS meta data is always present, but the file is just a bunch of bytes. If it has meta data or not depends on the file-type. And if the file-type has meta data it depends on the file format specification. Some file-types have meta data at the beginning, some at the end and some at multiple positions within the file. Even for MP3 meta data (ID3) can be AFAIR at the beginning (ID3v1) or at the end (ID3v2). – Robert Jul 20 '20 at 07:28

1 Answers1

1

Every file is written as bits & bits & bits &....so on.

In most operating systems, files are byte sequences at the OS level – i.e. they're only addressable as 8-bit units and not as individual bits.

Some file formats might be specified to reinterpret the bytes as a bit-stream (or more commonly as a collection of bit-stream 'packets'), but that's not the general case.

Does the metadata occur first or the actual file content, in the "bit sequence"?

It very much depends on the file type: that's literally a big part of what the file type defines.

Some file formats are specified to place the metadata in the beginning, others at the end, and yet others use a structured format (e.g. RIFF or PNG chunks) where the metadata chunk can be located anywhere as long as it is marked with a specific identifier.

  • MP3 files use the ID3v2 tag format – the position and layout of the metadata block is defined in the ID3 v2.3.0 or v2.4.0 specification. (For example, section 5 in the latter link says the tag should be at the beginning.)

    Many MP3 files however also include an ID3v1 tag block at the very end of the file (the last 256 bytes), and some might even have only ID3v1 but not ID3v2. Actually the MP3 file format originally did not have any provisions for embedded metadata in the first place, and I think the ID3v2 tags are still structured in such a way that they pretend to be unplayable audio frames (for compatibility).

  • PNG files use a tagged chunk structure where the critical metadata is in 'IDAT' and other chunks with specific tags – irrespective of their offset within the file. (The only restriction is that they must precede the actual "data" chunks.)

    JPEG files also use tagged segments, and they hold metadata in the Exif format, which is... a whole TIFF file embedded in a JPEG segment, so you have to interpret three layers of formats – first JPEG, then TIFF, then finally Exif. See also. Again, the segment's location can vary – it's identified as containing metadata by its type tag.

  • PE/COFF files (Windows .exe) are a mix: they have a fixed offset near the beginning, but it only holds a pointer to the real PE header, so the actual header's offset can vary from file to file. That header has some metadata at fixed offsets, then has a section list which can hold more metadata at arbitrary locations as long as it has a specific section name.

  • Microsoft Office (.docx/.xlsx) and OpenDocument (.odt/.ods) files are actually Zip archives, so they have Zip-level metadata (the "directory") at the end, and the actual document metadata is stored in a specific "file" in that Zip archive – so you must interpret the Zip file directory to find the Office metadata.

  • Some file types don't have embedded metadata. For example, text (.txt) files just contain arbitrary text.

u1686_grawity
  • 452,512