A novel method for hiding data using prime numbers?

Question

Has the following method of hiding data been proposed or studied? What is the efficiency or security of this method? What applications could use this method?

Data is to be hidden in a number that is the product of two prime numbers. One prime number contains the hidden data, and a bigger prime number indicates the length of the data, constructed using concatenation as follows.

$p = p_0 \ || \ data \ || \ p_{end}\ \ $ and $ \ \ q = q_0 \ || \ marker \ || \ q_{end}$

where $p_0$ and $q_0$ are random $single$ non-zero digits with $p_0 < q_0$, $data$ is a number with $k$ (decimal) digits, $marker$ is a random number with $k-1$ non-zero digits followed by $0$, and $p_{end}$ and $q_{end}$ are random numbers with $n-k-1$ digits.

The encoded data is $N = P \times Q$ where $P$ and $Q$ are the next primes after $p$ and $q$. $n$ is chosen large enough so that the factorization of a $2n$-digit number is not feasible and so that, together with choices of $p_{end}$ and $q_{end}$, the construction of $P$ and $Q$ does not cause $data$ or $marker$ to change.

A few comments: (1) Although some constraints on $P$ and $Q$ are known, there is not enough to use "factoring with partial information/known bits". (2) Knowing $P$ and $Q$, in any order, allows the hidden data to be uniquely found. (3) The method is easy to adapt to binary.

Example: $data$ is 271828 with $k$ = 6. For simplicity use $n$ = 12:

$p = \mathtt {1 \underline {271828} 67213}$, $P = \mathtt {1 \underline {271828} 67221} \ \ $ and $ \ \ q=\mathtt {6 \underline{97811} \underline {\underline {0}}97478}$, $Q = \mathtt {6 \underline{97811} \underline {\underline {0}}97499}$

$N = P \times Q = \mathtt {88749616158555602180279}$.

EDIT: Data can be any integer (zero or greater). To emphasise that it need not be prime, I changed the example data from 314159 (which is prime) to 271828 (a composite).

EDIT (March 30): Added "single" to descriptions of $p_0$ and $q_0$ to emphasise that each of $p_0$ and $q_0$ is a single non-zero digit. Note that the size of the data ($k$) is not known in advance, but is indicated by $marker$. Also, the best known result, due to Coppersmith, is that factorization is easy if half the bits of a factor are known.

What security properties are you hoping to achieve? Is this an encryption method? If so, how do you anticipate that the recipient (who doesn't know $P$ or $Q$ initially) to recover the message? Or is this a commitment scheme? — poncho, Mar 18 '22 at 20:10
Interesting idea :). 2 remarks: ideally, schemes should be efficient. Given the constraints on the primes, it seems this may not be the case here? Or have you tried with "crypto-sized" data besides your example? — Marc Ilunga, Mar 18 '22 at 21:37
This could conceivably be used as a kind of commitment scheme? — Mikero, Mar 19 '22 at 15:48
@Mikero: Hmm, yes, that could work. Not sure what it would give you compared to traditional (e.g. hash-based) commitment schemes, though. (Well, it could maybe give you a provable reduction to factoring, if you did it carefully. I'd much rather take, say, a provable reduction to discrete log, though.) — Ilmari Karonen, Mar 19 '22 at 21:51

poncho · Answer 1 · 2022-04-12T13:41:11.387

I'll examine this as a potential commitment scheme (as I can't think of how else you'd use this).

As you know Bob, a commitment scheme is one where Alice has a secret; she publishes a commitment to the secret ('commits to the secret'), and then later on, she 'opens' the commitment, revealing the secret.

There are two security properties of interest in a commitment scheme:

Hiding; Bob who sees the commitment can't tell what the secret is (even if he has a guess)
Binding; when Alice opens the commitment, she has to open it to the value she had in mind when she generated the commitment (that is, she cannot open it to either of two different values at opening time).

Actually, a commitment scheme can be either perfectly hiding (that is, it is impossible for Bob to gain any information, even if he had infinite computational power at his disposal) or perfectly binding (that is, it is impossible for Alice to open the commitment in two different ways); it is impossible for a scheme to be both.

Now, there are two standard commitment schemes out there:

Hash based commitments, where Alice takes the secret $S$ and a random value $R$ and publishes the commitment $\text{Hash}( S || R )$, for a collision-resistant hash function $\text{Hash}$; to open it, she publishes $S$ and $R$. This turns out the be computationally binding (based on the collision resistance of the hash, and assuming that the length of either S or R is well known), and statistically hiding (assuming $R$ is long enough, and a nonstandard but intuitive assumption on the hash function).
Pedersen commitments, where we have two different generators $g$ and $h$ (with unknown relationship) of some group where the discrete log problem is hard; to commit to a value $S$, she picks a random value $R$ and publishes $g^Sh^R$; to open it, she publishes $S$ and $R$. This turns out to be perfecting hiding, and computationally binding (reducible to the DLog problem).

Hash based commitments have the practical advantage that they're efficient. Pedersen commitments have the advantage that they have nicer provable properties, and they are also zero-knowledge proof friendly; for example, if Alice generates two commitments, she can generate a short zero-knowledge proof that those two commitments are two the same value (assuming, of course, they they are).

Now, on to your scheme:

As for the binding property, you are perfectly binding; Alice has only one way she can open the secret (assuming that Bob checks the revealed factors for primality).
As for the hiding property, it might appear at first glance to be reducible to the factorization problem. However, it's not quite that simple; the attacker knows apriori something about the factors (e.g. if he has a guess to the secret, and he also guesses where in $P$ that appears (not that many possibilities), then he has both some digits of $P$ and knows where nonzero digits in $Q$ appear (as well as a zero digit)). While it is not clear how that can be used to make factorization easier, that would be a nonstandard assumption.

Evaluating this scheme, it's not horrid (as long as the secret is short, the extra information available to the attacker is not likely to be exploitable); on the other hand, it is also inefficient (generating primes is expensive; validating the revealed $P$ and $Q$ values (making sure they are prime), while not as expensive, is still not excessively cheap). And, it would not be obvious how you would create an efficient zero-knowledge proof about a statement about the committed values.

Excellent answer. A very good summary of Commitment Scheme and important properties. Please note my EDIT about p0, q0 being single digits; and that data size is not known in advance. Too bad my reputation is not high enough for me to boost your score. Now, I will have to think about ZKP for my scheme. I'm surprised that such a simple scheme (even with some theoretical shortcomings) has not previously been proposed - I did not find any such references. — Lewis Baxter, Mar 30 '22 at 21:06

score 1 · Answer 2 · edited Mar 20 '22 at 11:25

I'm going to point out a few problems I see with this, these may be wrong, but I still found this intriguing

Setting your data to equal (or represented) by a prime number isn't always that easy. Given a certain message that encrypts to prime 1, what stops another message of equal length from encrypting to the same prime. (Small message space maybe?)
How would decryption work here...
I don't see the security proof here. Attacks on RSA have shown you can recover primes given a certain amount of bits in it. You can sorta approach a brute force attack on the length of the msg, then maybe use some sorta partial key exposure attack

I'm new to cryptography so its entirely likely I messed up in one of these assumptions, please feel free to call me out, and still congrats on the interesting ideas!

A novel method for hiding data using prime numbers?

2 Answers2