In Oauth what is the benefit of the access token being opaque

Question

Why was the decision taken that the Client doesn't need to be able to parse the access token?

It seems to me that if the token included, in addition to the current fields, a client_id and a user_id, it would make life much simpler, prevent spoofing, enable authentication, etc.

I've noted that as per section 1.4 of RFC 6749 it does not HAVE to be opaque, but it seems to me that much of OpenID Connect, if not all of it, is a way of making the Access Token into something meaningful. Yes, it adds an ID token, but the fact that it is implemented as something accompanying in stead of embedded in the access token does not change this fact. It seems every video, tutorial and blog post talks about it as if that is to be assumed to be the case (that access tokens will be opaque, because "The client is not the intended audience") ... And yet it is the API or resource provider who cannot establish authentication based on the the bearer having the token.

Actually I need to add that I don't think the token being opaque to the client is the problem. The problem is that the token is a bearer token that doesn't restrict WHO can use it.

Nick Steele · Accepted Answer · 2019-07-31T22:17:39.050

In OAuth, or any other protocol where the token can be opaque or transparent, the benefits and risks swing based on what your desired result is:

If you want the client to be able to parse the access token, you need the token to be transparent.
If you don't want the client to be able to parse the access token, you need the token to be opaque.

Indeed, this is the definition of opaque and transparent tokens; the client can see into them, or not.

If your token is transparent, you must sign it, otherwise, anyone could simply edit your token contents, and claim anything they want.

So, the first obvious benefit of having an opaque token is that you can just throw away any signing mechanism; you don't need it, and conversely, if you have a proper transparent token, the client can read the token, but not edit it.

Another benefit of having an opaque token is it is more secure; No encryption method is truly random (i.e., there must be a way to decrypt it), meaning you can always tell which encryption protocol is used if you can read enough of the contents of the tokens (transparent tokens). This itself reduces the time to identify your cipher (knowing the protocol). Many people don't think this is very risky, and indeed, if you use more than 80 bits of entropy, they are right (10 truly random characters for your cipher, i.e. "secret key").

However, and this is much more of a risk; transparent tokens allow crackers to play with the contents (because they see them), and some protocols have serious weaknesses that allow you to create collisions or even decode the cipher in a much shorter period because the cracker will know the protocol as well as the contents of valid blocks (most encryption protocols use 16 byte-long blocks, so if your token contains 160 chars of data, you've given the cracker 10 valid combinations to your cipher. This decreases the time it will take for them to identify your cipher by an order of magnitude. Giving them more transparent tokens will further reduce this time as well. Some protocols have a reduced version of this risk, but by the nature of what encoding is, having the encoded block, and decoded block both in your hands, you're always going to make it easier to break.

This is why many choose opaque tokens. Indeed you should always opt for opaque tokens if there is no need for any external entity to be able to parse your token. An example of this is a service framework in which you only pass the token to the service as a blind claim; you have no idea what it says, but you know it works and identifies you. It is only up to the service to confirm it's you via the secret key or cipher.

Opaque tokens also don't require signing, which I mentioned above, they also don't require divulging anything about chosen protocols.

For this reason, they are much more secure, not only mathematically and logically, but get rid of human implementation errors, which is, to be honest, 99% of the reason cracks are ever made in the first place.

Interesting note

Even true randomness means a 36.7% loss (1 divided by e) of combination space when randomly guessing. This means a 1000 possibility combination lock, if guessed randomly, actually has a 1 in 630 chance of getting it correct.

So 1 / e is the absolute best chance you have, and that is pure random.

Any "possibility" that is interesting to a human, "information", is not truly random by definition, and worse, any protocol that is used to decode it, gives up some possible space for the cipher by the way the protocol works. So, in reality, your chances of randomly guessing an answer is significantly smaller than the entropy or combination space most people talk about or think they actually have; it's usually, when all accounts are measured by a good hacker, at least one order of magnitude easier than a layperson thinks.

Do you have a citation for further reading about the 36.7% loss of combination space? — benbotto, Jul 31 '19 at 17:38
This is a fact that is easy to prove but not talked about often. This is referred to a lot in Wolfram material and cited in a new kind of science quite a bit but in passing as it's fact, not with proof (https://www.amazon.com/New-Kind-Science-Stephen-Wolfram-ebook/dp/B01N1I83V8/). You can in fact see for yourself this is true by making an array of size n, and then randomly filling a position with a value n times. The larger the value of n, the greater the unfilled positions will converge to exactly 1 / e. Although random is "random", it still follows this pattern perfectly. — Nick Steele, Jul 31 '19 at 20:38
This 1 / e phenomenon is at the base of what causes most "patterns" in random behavior. There is a great youtube video by VSauce (https://www.youtube.com/watch?v=fCn8zs912OE) that explains zipf's law (https://simple.wikipedia.org/wiki/Zipf%27s_law) and various trivia and real-world results of the qualities of randomness, but the video never dives into the math or explains the 1 / e convergence of randomness. Luckily for you, you can see the magic convergence on your own PC with maybe 5 lines of code, and see deeper into reality than you may want to ever go :) — Nick Steele, Jul 31 '19 at 20:45
Thank you for the links. While I agree that the "loss of combination space" converges on 1/e, I don't follow the 1/730 number (I think you meant 1/630, by the way). Given a random number in the range [0, 999], the chance of picking that number randomly is .001. That's pretty easy to show empirically. — benbotto, Jul 31 '19 at 22:13
Thanks for the math fix on 730 vs 630 :)
You might think that given a random choice in the range of n, the chances of picking that choice randomly is 1 / n. But this is not what happens randomly, this is what happens when you also attach the behavior of avoiding duplicate guesses.

Given the range of n containing a single intended answer, and given n random guesses at finding this answer, you will duplicate 1 / e attempts, per attempt, if you opt for a random guess. Given the answers location is also random, it will also be placed in identical locations with a chance of 1 / e. — Nick Steele, Jul 31 '19 at 22:30
What this 1 / e thing ends up meaning, is that while we assume random chance is 1 / n, it's actually 1 / e in practice, or in series, because with real randomness, there is also real chance for duplicates, and that chance is exactly 1 / e for duplicates. Wolfram in his book quoted above goes from this 1 / e fact, to attempting to prove you can compress any information down to it's "essence" by gaming the 1 / e duplicates in a sort of "rehashing" of compression over and over again. The idea is mind blowing and also sounds impossible, but I can't see where he is wrong. — Nick Steele, Jul 31 '19 at 22:43
I'm saying that the 1/630 is inaccurate; maybe it's just poorly worded. If you have a random lock with a combo in the range of [0, 999] the chance of correctly guessing the combination in one attempt is 1/1000. If you guess randomly 1000 times, the chance of getting every guess wrong is (1 - 1/1000)^1000=1/e. The chance of getting at least one guess right is therefore 1 - (999/1000)^1000=1-1/e. Said a different way, if you have 1000 locks with random combos in the range [0, 999] and you guess 1 combination, you have a 1-1/e chance that a lock has that random combo. — benbotto, Jul 31 '19 at 22:53
Sorry, I'm not sure what you're saying about the 1/630 being inaccurate. I think I agree with your numbers/reasoning. Can you clarify what you mean by "maybe it's just poorly worded"? Are you saying that said differently using the wording you provided, you think it's accurate, or something else? — Nick Steele, Aug 02 '19 at 17:31
This is the specific piece: "This means a 1000 possibility combination lock, if guessed randomly, actually has a 1 in 630 chance of getting it correct." To me that sounds like you're saying that if you have an arbitrary 3-digit lock and make one guess at the combination, you have a 1/630 chance of getting the combination right. If that is what you're saying then it's incorrect; if not then the way it's worded is confusing. — benbotto, Aug 02 '19 at 17:48

In Oauth what is the benefit of the access token being opaque

1 Answers1