Do Git Commit signatures prevent repository modification?

Question

Git commit signatures seems the signature signs the commit message, but I can't find much information on what the signatures actually solve, and don't understand the git architecture.

If I have a repository which began unsigned but moved to a signed model, can a malicious user with write access perform any of the following tasks without invalidating the latest signature:

Modify data committed with a signed commit message
Modify data prior to the first signed commit in a way that results in the latest commit being different (Eg modify a part of the code that signed commits do not touch, meaning they won't create any diffs which overwrite the maliciously modified component)

score 2 · Answer 1 · answered Sep 05 '19 at 03:30

2

Both of those things are impossible, unless they can break SHA1 (which is currently vulnerable only to collision attacks, not preimage or second preimage attacks, so they would have had to trick the legitimate user into accepting a dodgy binary file). And that's exactly the point of commit signing; if it didn't prevent those things, it would be worthless.

answered Sep 05 '19 at 03:30

Joseph Sible-Reinstate Monica

7,519
3
24
35

Could you expand on how (or more what) it signs commits in a way that prevents collision attacks? – throwaway124215 Sep 05 '19 at 03:42
1

@throwaway124215 My point is that it doesn't. The weaknesses in SHA1 would theoretically allow such an attack against signed Git commits. It's a weakness in an otherwise strong system. – Joseph Sible-Reinstate Monica Sep 05 '19 at 04:10

score 0 · Answer 2 · answered Mar 13 '24 at 16:30

What exactly is protected by a Git commit signature and how it is protected is much easier to understand if you understand the core data structures in a Git repository.

A git repository stores two types of data: objects, which are immutable and the same in all repos in which they are present,¹ and refs, which are variables that may (and often do) have different values in different copies of the repo.

The refs are things like "branch"² names, such as main (whose full name would be refs/heads/main. Think of these as variable names that have associated values that are commit IDs such as 04b871796dc0420f8e7561a895b52484b701d51a. You needn't be terribly concerned about these as far as security goes, except to realise that, being variables whose values differ between repos, they are not a reliable way to communicate which commits you're actually talking about.

The objects are arbitrary chunks of data identified by an ID that is the SHA-1 hash of the data. For example, here is the data for a commit object:

tree eebfed94e75e7760540d1485c740902590a00332
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465981137 +0000
committer C O Mitter <committer@example.com> 1465981137 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 $
 iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
 HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
 DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
 zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
 HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
 EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
 =jKHM
 -----END PGP SIGNATURE-----
signed commit
signed commit message body

In the above, note that:

The $ is not part of the data, it highlights that that line consists of a single space.
The "signed commit" and "signed commit message body" are the commit message.)
The signature is (obviously) over the commit object data excluding the gpgsig field.

A "commit" as the word is used in casual conversation is a blockchain (or, perhaps more accurately, a "block directed graph") of these objects. As you can see from the above the commit object refers to the ID of a tree, which in turn will contain IDs of files (objects containing the file data) and sub-trees. The commit object also (except for the initial commit) contains references to the IDs of parent commits, forming a chain back to the initial commit.

From this you can see that Git inherently gives you cryptographic authentication of certain data about commits: given a commit ID you can trust that the contents of that commit in your repo are the same as that in any other copy of the repo, at least as far as you trust SHA-1.

That of course doesn't tell you who created the commit, because anybody can create a commit like the example above. This is the reason that we have Git commit signatures: those associate a particular commit to a particular private key (PGP, SSH, or whatever other signatures your particular implementation supports) in the usual way of public key cryptographic systems.

The signature covers only the data in the commit object itself (i.e., the data you see in the example above, less the gpgsig field). That is as secure as the public key encryption system and procedures used to generate the signature.

The remainder, including the files in the commit, the parent commits, and their files, is only as secure as the hashing algorithm used to build the chain/graph. In a standard Git repo this is SHA-1, which, while not terrible, [has not been considered secure][sch] against well-funded attackers since the mid-2000s.

This can be mitigated by using SHA-256 hashes in Git, but this is not the default and is still considered experimental functionality (though the format is not expected to change in a backward incompatible way). But do keep in mind that many (most?) remote repo storage systems (such as GitHub) do not support SHA-256 hashes, so you cannot transfer SHA-256 hashes through them.

^{¹ While the Git user interface often presents the illusion that
objects can be modified, this is not actually the case. When you perform an
action such as a git commit --amend, no objects are modified. Instead, a
new commit object is created (with a new ID) with the same parent as the
old commit object and the head ref you're currently using (e.g., a "branch"
called main) is then updated to reference that new commit. The old commit
is still in the repo and can still be accessed via its ID until it's
eventually garbage-collected. (You can easily find the IDs of commits no
longer referenced by your heads ("branches") using git reflog.)}

^{² I use scare quotes on "branch" here because the term can be
confusing when discussing the structure of a Git repo. Both the trees of
Git commits and the git object database itself are graphs, and in graph
theory "branch" has a specific meaning that involves multiple nodes in
the graph. As the term is generally used by the Git documentation and
community, it really means a "ref," i.e., one of the variables with names
like refs/heads/main that points to a single node in the graph at any one
time, but points to different nodes over time.}

Do Git Commit signatures prevent repository modification?

2 Answers2