Search by hashed value

Question

I would like to design a REST API endpoint (POST) that takes in some sensitive identifier information in the request body:

{
   "someDataToSearch": "abcdefgh"
}

I then want to hash abcdefgh and search for that hash in our DB and return the corresponding entity, which will have been generated and stored in the DB prior to this as part of some business logic.

I wanted to use Bcrypt to do this but I've realised that requires a salt to generate, and we won't be passing in the salt as part of the request body above. So I don't think we can do a lookup with that, correct me if I'm wrong?

What would be a secure algorithm and approach to do this? Thanks

Sounds like you are describing hash tables -- a very common data structure, in fact your database may already use them under the hood. Usually hash tables do not use a cryptographic-strength hash, and in fact I can imagine that performance is improved by using a hash designed for searching&sorting. So why do you need a crypto-strength hash here? Does your use-case have some requirement that these hashes leak no information about the original input? — Mike Ounsworth, Oct 02 '20 at 15:42
This looks like a XY problem to me where Y is your approach to using a hash and X the unknown problem you want to solve this way. Note that just "use a hash" is not a security problem by itself, i.e. as it is I consider the question off-topic. It might be a on-topic once you describe the real problem (X), i.e. what do you want to achieve by using the hash. — Steffen Ullrich, Oct 02 '20 at 15:57
Yep the original input is sensitive, cannot be compromised. The request into the API will be performed over SSL. Basically our business logic creates an entity with one of the fields being this hashed sensitive value. And we want to allow API users to search based on the unhashed original input. But I don't see how this can be achieved as the design stands? — theartv, Oct 02 '20 at 15:57
@theartv: What prevents you from simply using an unsalted hash like SHA-2 for this? What threats should the hashing mitigate in the first place? — Steffen Ullrich, Oct 02 '20 at 15:59
@SteffenUllrich: Really just looking for a secure approach that best prevents things like brute force attacks. Alongside the hashed value we also store a 'masked' version of the original input. If someone were to compromise our db they could potentially use this masked value to generate a hash that matches the hash of the original input itself. I believe SHA-2 is much faster than bcrypt so would be easier for an attacker (?). — theartv, Oct 02 '20 at 16:11

Mike Ounsworth · Answer 1 · 2020-10-02T16:24:07.017

Based on your comments, I am going to assume for the sake of simplicity that the input you want to search on is a password.

So you don't want to store the raw password in your database, so you'll store a cryptographic hash of it. Now you want to be able to search on this column. This presents a number of problems:

You are limited to exact-match searches. Obviously there's no way to do a partial-match search after you have run the value through a cryptographic hash.
Hash functions designed for passwords all require a salt and a work-factor that make them very slow, use a lot of memory, or both. Neither of these seem ideal for a database lookup.

The solution is going to depend on a deeper analysis of exactly what security properties you need, and don't need.

If the input is sensitive but high-entropy (for example a private key) then you don't need a password hashing function because the input is not brute-forcable anyway, and could use instead a single pass of SHA-256. Fast and no salt.
I assume you've already considered searching by an Id or key instead, and for whatever reason this doesn't work. (you'd probably need a way to look up the key for a given entry, in which case you're back to the same problem)
If your input is low-entropy (like a password), and you really do need a salted&iterated hash, then I don't see any good options because you'll need some way to know which salt to use before you can hash&lookup. This chicken-and-eggs with the need for some sort of Id/key (or with the salt serving as that Id/key)

Taking a step back and squinting, this looks a bit like password authentication: user logs in with their password, and the server looks them up and returns a bunch of data for them. The key point here is that no systems use the password as the lookup key; that's what the username is for! Providing the secret is simply to prove that they are who they say they are.

I agree with @Steffen that this seems like an XY problem. You're hitting dead-ends, likely, because the larger problem you're trying to solve doesn't fit into the shape of problems that we know how to solve efficiently. It may be back-to-the-drawing-board time to see if you can re-cast your problem so that it fits some established pattern.

Ok thanks for the reply. I think you're right, it seems like a non-starter. Might need to change the API design to take in a salt as well to allow us to use Bcrypt/PBKDF2 — theartv, Oct 02 '20 at 17:03

Search by hashed value

1 Answers1