A simple hash is definitely out of question because of how ridiculously easy it is to crack 10 digit phone numbers.
two possible solutions seem to be slow hashing algorithm and salting.
Salting
- If you have the salt, then it's trivial again.
- If you don't, and
- the salt was secure (e.g.: 128 bits or more from a good random source), then it's impossible for both any attacker and the data owner themselves. The website can no longer check whether you entered the correct number, in your example, so salt rotation is effectively the same as deleting the hash.
- if the salt is not secure (4 bytes alphanumeric seems to be common), then it is the same as slow hashing. Instead of having a fixed number of rounds, you have to guess the salt. This, again, applies both to the data's legitimate owner and to any attacker from whom the salted hash should protect the phone number. I don't see an advantage over slow hashing.
Slow hashing
How slow is reasonable, depends on the scenario. In the example from your question:
When user logs in, they provide their original phone number, which would later be used to verify against hash, and an OTP will be sent if it matches.
It might be possible to go pretty slow. If you don't need to log in multiple times a day, waiting 5 seconds for the slow hash isn't so bad. (In practice, very few of our customers find it worth it to keep a server busy and make the user wait for more than half a second, but let's use 5 seconds as a best-case scenario.)
Based on my own tests, a cracking station with GPUs get about a 17× speedup as compared to a server which performs Bcrypt hashing on a CPU. I suspect this is due to the higher memory bandwidth of a GPU. (For PBKDF2, the speedup is 1000×, but let's once again assume a best-case scenario where a good hashing algorithm was chosen.)
An attacker therefore needs to spend 5/17=0.3 seconds to check whether a given hash matches a given phone number. Give it a few days and they can check 1.2 million phone numbers¹. Add a second GPU and it goes twice as fast. I don't know about other countries but there are only 60 million Dutch mobile numbers in total.
Conclusion: hashed phone numbers can always be cracked over time for a motivated attacker, such as law enforcement or if there is monetary gain². If you can narrow the number range down to a few area codes you're interested in, the equation gets worse. If the target site didn't use Bcrypt but PBKDF2, or plain Blake2 or something, it also gets much easier. If the website didn't make users wait 5 seconds for the hashing to complete, but a more realistic 0.5 seconds instead, it also gets much faster.
Practically, phone number hashing adds only a little technical protection. The practical advantage I see is that, if you tell your users that their phone numbers go through a strong hashing function, the marketing department can't use them without publicly changing the privacy policy and having everyone accept the new terms first. It's also very clearly off-limits if you need to involve technical expertise to crack the numbers rather than just reading the plain numbers off of a sheet, so it's unlikely to be mishandled. The hash provides a psychological barrier more than technical security.
Secure computation
This is not my expertise, the text below should be correct on principles but may be incorrect on implementation details.
The basis for this idea comes from Signal: https://signal.org/blog/private-contact-discovery/#sgx-contact-discovery. They compare a list of contacts (can be hundreds of numbers) against their list of all registered users (might scale to billions of numbers) to find people to chat with. Your scenario, namely verifying whether one phone number belongs to your account (one specific database record), should be a lot easier than what Signal does.
Your client device would:
- Verify that the expected code runs in the secure execution environment. This involves some public key encryption, where the maker of the environment (in Signal's example: Intel, because they made SGX) provides a public key and the private key is baked into the CPU.
- Establish a communications channel that the server application can't read. Since you have the public half of a key which is baked into the trusted environment of the CPU, you can encrypt a secret for it and then only the trusted environment can decrypt that data.
- Encrypt the phone number for the trusted environment and send it over.
The code inside the environment can now do its thing, for example verifying that the number matches what is on record in the database for your account, and returning only 'yes' or 'no' to the application server. The application server can then send the SMS if the response was 'yes'.
In practice, there are constantly vulnerabilities being found in SGX. A different HSM vendor could probably be a good substitute, where the HSM's CPU is actually separate and you don't have all these side channels that SGX has. I don't know why Signal didn't consider that as an option, perhaps HSMs lack features that Signal needs for this mass data comparison.
Besides being the only actually secure option (unless a vulnerability is found in the environment, which is always a risk), this approach also does not have a time trade-off, like the slow hash where slower is more secure. The client only needs a few KB of RAM to do the asymmetric cryptographic operations.
¹ apt install qalc && qalc '4 days / (5/17) seconds' #shows 1175040
² Examples: not everyone wants it known that they're registered on (certain) dating sites. If the database of hashed phone numbers leaks, an attacker could send SMSes to cracked phone numbers, saying that they'll publish the information unless they pay up. Or one could do targeted phishing based on which website the phone numbers came from.
beginstead of123to avoid a digit-only basis. – dandavis Dec 01 '21 at 18:33