Is this a right technique to create and validate session tokens?

Question

Current token format, creation, verification:

vls_k3uGjFsDfA49Ygt8mqNHAtkBuUqRTU6K1KfUCwEiX9Z

I am creating session token as follows:

Create an array of 32 bytes.
Fill the first 28 bytes via PRNG.
Calculate checksum (CRC-32) from the 28 bytes and add the result to end of the array.
Use Base-58 to encode the array and append with the prefix (vls_)

How the token stored in the db:

Hash the created token via (SHA-512)
Store it in a table with the associated user's id

I am verifying the token as follows:

Check if it has prefix.
Slice it to get token without prefix
Decode it to get the bytes.
Slice it again to the random bytes (byte array length - 4) and calculate checksum
Compare the token checksum with the calculated checksum
Hash the token (SHA-512)
Check the db...

Skipped some checks for clarity.

My question:

Do I gain anything using SHA-512 instead of SHA-256?

I think not because the token is already high entropy.
If the attacker somehow got the table and probably going to know the format so he can generate tokens and hashes so it is not going to matter what hash method is used.

I might be completely wrong about this, so I'm open to any feedback.

score 25 · Accepted Answer · answered Oct 03 '21 at 17:03

This approach is over-engineered. You don't need a checksum.

You have a database in which you can store session IDs associated with users. Use a cryptographically secure random number generator (CSPRNG) to generate a random session ID. A length of 16 bytes (128 bits) is fine. Format it as hex or base64 if you need to represent it as a string. Give the user the session ID, e.g. as a cookie. Hash the session ID, then store it in the database along with an expiry date (timestamp) for the session.

When the user sends a request, take the session ID they sent, hash it, and look that hash up in the database. If the hashed session ID is found, and the current timestamp doesn't exceed the expiry date in the database, the session is valid and they're logged in. Otherwise they're not logged in. You can update the expiry timestamp on every successful request so that it only expires after a certain period of inactivity.

You don't strictly need format and length validation on the provided token, but for good practice you should at least check that it is the correct length and only contains the expected characters (e.g. regex /^[0-9a-f]{32}$/ if it's a 128-bit value encoded as a hex string). You don't need any prefix or checksum. If the user changes the value it will not match a session ID.

You may also choose to store information such as the user's IP address and browser user agent, and reject access if an attempt is made to use a session ID with a different IP/agent. This helps prevent session ID theft from the browser, e.g. if the user has malware installed.

The reason for hashing the session ID in the database is so that an attacker cannot trivially take over active user sessions if they gain read access to the database, e.g. through SQL injection. Because the session ID is a long random string, you do not need to use anything more than a cryptographic hash (e.g. SHA256) here. The hash does not need to be salted, and you don't need a computationally hard KDF like Argon2, bcrypt, or PBKDF2. A sufficiently long (i.e. 128-bit or greater) value generated by a CSPRNG is not guessable by an attacker, and the keyspace is far too large for a brute-force attack on the hash.

Another reason for hashing the session ID is timing attacks on indexed lookups. An index lookup does not take a constant-time path when searching for a string, so this timing side-channel may allow an attacker to progressively reduce the search space and discover active session IDs. This attack is usually impractical outside of lab scenarios, but the solution is to hash the token so that the attacker cannot easily perform database index lookups with a chosen prefix. We're already hashing the session ID for other reasons, so the question of practicality is moot.

The output length of the hash function is not critical here. A 128-bit session ID is more than sufficient, so as long as you're using a cryptographic hash function with an output size larger than the minimum security bound (128 bits) it will be fine. Using SHA512 instead of SHA256 doesn't offer any benefits, and just increases the storage size and computational cost of the hash.

You must use a CSPRNG to generate the session IDs. Standard library random functions like rand() or mt_rand() are not suitable for generating security-sensitive secrets. Your language's standard library may offer a cryptographic random number generator API, e.g. random_bytes in PHP, RandomNumberGenerator in .NET, or SecureRandom in Java. You may also read from /dev/urandom (but not /dev/random - that is a legacy interface) on Linux/BSD environments.

The prefix is becoming generally preferred as a way to help disambiguate and/or scope what are otherwise opaque binary blocks. — chrylis -cautiouslyoptimistic-, Oct 04 '21 at 00:30
@chrylis-cautiouslyoptimistic- In this context there's no need to disambiguate at all; the cookie name tells you exactly what the token is for. — Polynomial, Oct 04 '21 at 00:40
The CRC is valuable because it lets you avoid the expensive database lookups for most invalid tokens — Ben Voigt, Oct 04 '21 at 20:06
@BenVoigt An indexed database lookup would be O(log N) in the worst case, and unless you're using replication and the initial lookup query contains a modified field (e.g. expiry time) a request with a valid session ID should result in a cache hit 100% of the time. If you're scaling to the point where an indexed session lookup across the user table is problematic, even with caching, you're probably already load balancing users across nodes anyway, so you can shard the session table. The risk of dodgy parsing logic causing problems isn't worth the tiny performance saving, imo. — Polynomial, Oct 04 '21 at 20:16
Also, if an attacker was purposefully trying to cause a DoS via session ID lookups, they'd just calculate the CRC. You can't assume they wouldn't know to do that - see Kerckhoff's principle. — Polynomial, Oct 04 '21 at 20:17
@Polynomial came across this post and was wondering, let's say you thus generate your supposingly secure session ID hashes in PHP using sth like password_hash(random_bytes(16),PASSWORD_DEFAULT). The value of that should then accordingly be the PK of your session data table, used for session ID lookup during authentication. Which specific data type and index would you use on that column, assuming a MariaDB DB? I'm just coming with such an example as e.g. password_hash does not necessarily return a fix-length hash. cf the docs — DevelJoe, Jun 06 '23 at 12:48
@Polynomial We're also wondering that the fact that you're indexing a hash in a DB means that the same hashing method should be used, without any random salt included in the hashing. Isn't that theoretically less secure than proper hashing with a salt, as most methods like password_verify in PHP, for example, do it (and explicitly do not allow to specify the value of the salt anymore)? — DevelJoe, Jun 06 '23 at 23:21
format and length validation
So in theory an attacker using a different string that hashes to the same result is a plausible attack vector?

Would is be less secure to include the user id as well as the session, so I am not searching for the any session with some session ID, instead checking if User 106 has a session open, and then validating the session id? — Jonathon, Nov 18 '23 at 16:52
@DevelJoe That is a MariaDB question, but as the php document points out it is a 60 character string currently, but might expand in the future so something that fits a 255 character string is recommended — Jonathon, Nov 26 '23 at 21:13

Is this a right technique to create and validate session tokens?

Current token format, creation, verification:

My question:

1 Answers1