What exactly is protected by a Git commit signature and how it is protected
is much easier to understand if you understand the core data structures in
a Git repository.
A git repository stores two types of data: objects, which are immutable
and the same in all repos in which they are present,¹ and refs, which are
variables that may (and often do) have different values in different copies
of the repo.
The refs are things like "branch"² names, such as main (whose full name
would be refs/heads/main. Think of these as variable names that have
associated values that are commit IDs such as
04b871796dc0420f8e7561a895b52484b701d51a. You needn't be terribly
concerned about these as far as security goes, except to realise that,
being variables whose values differ between repos, they are not a reliable
way to communicate which commits you're actually talking about.
The objects are arbitrary chunks of data identified by an ID that is the
SHA-1 hash of the data. For example, here is the data for a commit object:
tree eebfed94e75e7760540d1485c740902590a00332
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465981137 +0000
committer C O Mitter <committer@example.com> 1465981137 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
$
iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
=jKHM
-----END PGP SIGNATURE-----
signed commit
signed commit message body
In the above, note that:
- The
$ is not part of the data, it highlights that that line consists of
a single space.
- The "signed commit" and "signed commit message body" are the commit
message.)
- The signature is (obviously) over the commit object data excluding the
gpgsig field.
A "commit" as the word is used in casual conversation is a blockchain (or,
perhaps more accurately, a "block directed graph") of these objects. As you
can see from the above the commit object refers to the ID of a tree, which
in turn will contain IDs of files (objects containing the file data) and
sub-trees. The commit object also (except for the initial commit) contains
references to the IDs of parent commits, forming a chain back to the
initial commit.
From this you can see that Git inherently gives you cryptographic
authentication of certain data about commits: given a commit ID you can
trust that the contents of that commit in your repo are the same as that in
any other copy of the repo, at least as far as you trust SHA-1.
That of course doesn't tell you who created the commit, because anybody
can create a commit like the example above. This is the reason that we have
Git commit signatures: those associate a particular commit to a particular
private key (PGP, SSH, or whatever other signatures your particular
implementation supports) in the usual way of public key cryptographic
systems.
The signature covers only the data in the commit object itself (i.e., the
data you see in the example above, less the gpgsig field). That is as
secure as the public key encryption system and procedures used to generate
the signature.
The remainder, including the files in the commit, the parent commits, and
their files, is only as secure as the hashing algorithm used to build the
chain/graph. In a standard Git repo this is SHA-1, which, while not
terrible, [has not been considered secure][sch] against well-funded
attackers since the mid-2000s.
This can be mitigated by using SHA-256 hashes in Git, but this is
not the default and is still considered experimental
functionality (though the format is not expected to change in a
backward incompatible way). But do keep in mind that many (most?) remote
repo storage systems (such as GitHub) do not support SHA-256 hashes, so you
cannot transfer SHA-256 hashes through them.
¹ While the Git user interface often presents the illusion that
objects can be modified, this is not actually the case. When you perform an
action such as a git commit --amend, no objects are modified. Instead, a
new commit object is created (with a new ID) with the same parent as the
old commit object and the head ref you're currently using (e.g., a "branch"
called main) is then updated to reference that new commit. The old commit
is still in the repo and can still be accessed via its ID until it's
eventually garbage-collected. (You can easily find the IDs of commits no
longer referenced by your heads ("branches") using git reflog.)
² I use scare quotes on "branch" here because the term can be
confusing when discussing the structure of a Git repo. Both the trees of
Git commits and the git object database itself are graphs, and in graph
theory "branch" has a specific meaning that involves multiple nodes in
the graph. As the term is generally used by the Git documentation and
community, it really means a "ref," i.e., one of the variables with names
like refs/heads/main that points to a single node in the graph at any one
time, but points to different nodes over time.