11

Consider the following cryptographic hash function $H$ which maps a message $m$ of variable size to $b$ bits:

$$H:\{0,1\}^{*} \mapsto \{0,1\}^b$$ $$y = H(m) = SPRP(IV||m||padding)\mid_{b}$$

, where: $$SPRP:\{0,1\}^n \mapsto \{0,1\}^n,\\ |m|+|IV|+|padding|=n,\\ |IV| = b.$$

Such a hash function could be considered non-iterative since, unlikely an iterative hash function, no entropy is discarded until truncation in the last step. While such a function has the disadvantage of not being streaming, it does have the nice property that finding a multicollision[0] is much harder than in iterative hash functions.

  • What other non-iterative hash function have been developed?
  • Is there another name for this as non-iterative doesn't seem to be a good search term?

Edited to add a more formal definition: An iterative hash function is any hash function in which some the message can be compressed independent of the entire message.

A hash function $G$ is iterative iff: $$\exists\ f\ h\ \pi, \forall m:\\ \pi(m_1||\ m_2)=m\ \ \ \wedge\\ |f(m_1)|<|m_1|<|m|\ \ \ \wedge\\ h(f(m_1),f(m_2)) = G(m)$$

A non-iterative hash function is any hash function for which it is impossible to compress any bits of the message without access to the entire message.

A hash function $H$ is non-iterative iff:

$$\not \exists\ f\ h\ \pi, \forall m:\\ \pi(m_1||m_2)=m\ \ \ \wedge\\ |f(m_1)|<|m_1|<|m|\ \ \ \wedge\\ h(f(m_1),f(m_2)) = H(m)$$

YMMV, the above definition is not a standard definition and exists to explain what I mean by non-iterative.

[0]: Antoine Joux, Multicollisions in iterated hash functions, 2004

Biv
  • 9,979
  • 2
  • 39
  • 67
Ethan Heilman
  • 2,276
  • 1
  • 20
  • 40
  • 1
    I'm not sure I see what 'non-iterative' means formally. Can you give a more precise definition? – pg1989 Apr 03 '16 at 19:25
  • Also, would (e.g.) a sponge-based hash like Keccak qualify as non-iterative? – pg1989 Apr 03 '16 at 19:27
  • @pg1989 Added a more precise definition. – Ethan Heilman Apr 03 '16 at 20:54
  • @pg1989 Sponge-based hash functions like Keccak are considered iterative since the potential message space is much larger than the intermediate state (the sponge). One could imagine a construction in which the intermediate state grew in space to ensure no message entropy was lost, but it wouldn't be a sponge construction. – Ethan Heilman Apr 03 '16 at 20:57
  • 1
    Your 'non-iterative' hash function is similar to the the compression function of MD6 (a truncated permutation that can take an input message of up to 4096 bits). Of course, this compression function is then used in a mode of operation that turns MD6 into an iterative hash function with a far larger potential message space - but then so could your hash function. – J.D. Apr 09 '16 at 02:15
  • I did some of the differential resistance work on MD6 so I'm a big fan of the design. Its a nice property of a non-iterative hash functions that you can tree hash them into an iterative design if you want to. – Ethan Heilman Apr 09 '16 at 03:02
  • 1
    @EthanHeilman - I am also a fan of MD6. So is there anything besides truncated permutation-based hash functions that do not discard entropy until the final truncation step? Well, any such function would need to be injective (prior to the truncation step) in order to not lose entropy. So there are two alternatives to a permutation that fit the bill: 1) a bijective function where the domain is not the codomain, and 2) an injective non-surjective function. i.e. an 'expanding' function, where the codomain is larger than the domain). – J.D. Apr 09 '16 at 03:39
  • 1
    With your definitions, can't you construct a non-iterative hash from any normal hash using $h'(m) = h(m||r(m))$, where $r$ reverses the bitstring? – otus Apr 09 '16 at 13:14
  • @otus yesterday If $h$ is an iterative hash and $|m|$ is significantly longer than the message block size then you can compress most of the messaged into $f(m||r(m)\mid_b)$ and compress the remainder in a second call to $f$.

    One test if something is an iterative hash is: 'can I cause a collision prior to reading the entire message?'

    – Ethan Heilman Apr 10 '16 at 16:36
  • @EthanHeilman, but you can't with the example I gave, can you? Even if you find a collision in $h(m) = h(m')$, that will not apply to $h'(m) = h(m||r(m))$ or $h'(m||m_2) = h(m||m_2||r(m_2)||r(m))$... – otus Apr 10 '16 at 16:44
  • @otus Consider two messages $m$ and $m'$ where $f(m_2||m_3||m_4||m_4||m_3||m_2) = f(m'_2||m'_3||m'_4||m'_4||m'_3||m'_2)$ and $m_1 = m'_1$. You don't actually need to know what $m_1$ is to know you have a collision.

    Part of the reason I asked this question was that I was looking for a better formal definition of non-iterative because I am not satisfied by the definition I currently have.

    – Ethan Heilman Apr 10 '16 at 17:25
  • 1
    @EthanHeilman - how about this for a formal definition: a hash function $H_x(y)$ that produces a digest of length $x$ is "non-iterative" iff for any two distinct messages $M_1$, $M_2$, there is a finite digest size $b$ such that $H_b(M_1) \neq H_b(M_2)$. For example, think of an injective Random Oracle, ${0,1}^*\mapsto {0,1}^{\infty}$, where no two finite messages map to the same infinite string. $H_x(y)$ calls the RO, and truncates the output to a string of length $x$. This definition also applies to your example permutation based hash function. – J.D. Apr 10 '16 at 18:22
  • @EthanHeilman, thanks, yeah, now I understand what you are going for. – otus Apr 10 '16 at 19:59
  • The notation in the setup doesn't make sense. How can $H$ take an arbitrary-length input if $SPRP$ only takes fixed-length inputs? What happens if $|m| > n$? – Chris Peikert Apr 13 '16 at 01:07
  • @ChrisPeikert SPRPs exist for any length, thus for each $|m|$ we choose a SPRP with the correct domain size. – Ethan Heilman Apr 13 '16 at 03:17

1 Answers1

3

Per my comment, I'd like to suggest a definition for "non-iterative hash function", and propose some constructions that fit the definition. I will also suggest an alternate name (though it may not help much with searching for papers on the topic).

Let $\mathcal{M}$ be the message space of a hash function, e.g. $\mathcal{M}=\{0,1\}^{*<\ell}$, the set of all binary strings of length less than $\ell$ for some $\ell \in \mathbb{N}$. Let $\mathcal{D}$ be the 'digest space' (codomain) of a hash function, e.g. $\{0,1\}^b$ for some constant $b$. I will use subscripts to denote which hash function is associated with a given message space or digest space. Also, let $subseq_y(x)$ be a function that takes binary strings of arbitrary length and outputs a fixed subsequence of the string of length $y$ (e.g. truncation of the string to its first $y$ bits, or outputting only every third bit up to bit $3y$, etc).

A hash function $H(x)$ with digests of length $b$ is "non-iterative" or "uncompressible" if and only if there exists another function $G(x)$ such that:

  • $\mathcal{M}_{H(x)}\subseteq\mathcal{M}_{G(x)}$,
  • $|\mathcal{M}_{G(x)}| \le |\mathcal{D}_{G(x)}|$,
  • $G(x)$ is injective - no two distinct messages in $\mathcal{M}_{G(x)}$ map to the same digest in $\mathcal{D}_{G(x)}$, and
  • For any message $m \in \mathcal{M}_{H(x)}$, $H(m) = subseq_b(G(m))$.

Note that a hash function need not be cryptographically secure to be non-iterative by this definition.

The construction in the question meets this definition: In that case, the function $G(x)$ is simply $SPRP(IV||m||padding)$, without truncation.

As described in my comment, another construction is to truncate an injective Random Oracle. Unlike the permutation-based construction, this doesn't have a fixed limit on the message space size (or digest size) defined by the blocklength of the underlying $SPRP$, and yet is just as "non-iterative" or "uncompressible".

As a concrete instantiation of a "non-iterative" or "uncompressible" hash function with no limit on the message or digest lengths, I propose an 'expanding sponge' function. This is just like an ordinary sponge function, but with two differences: 1) instead of using a fixed size permutation it uses a (keyless or fixed key) variable-length blockcipher (like the BEAR blockcipher), and 2) at each step during the absorption phase, instead of xoring the message blocks into the state, it concatenates the next message block with the state; i.e. $S_n$, the state at step $n$ is equal to $\mathcal{E}(S_{n-1}||m_n)$, where $\mathcal{E}(x)$ is encryption with the variable-length blockcipher.

Edit to clarify: For this expanding sponge function, the injective function $G(x)$ that makes this construction "uncompressible" has the same absorbing stage as $H(x)$, but during the squeezing stage $G(x)$ outputs the entire state at each step instead of only part of the state. The output digest of $H(x)$ is thus a subsequence of the output digest of $G(x)$. $G(x)$ is of course trivially insecure, in the sense that one can easily invert the function to find the preimage of any digest.

Note that this construction is in a sense 'iterative', in that it breaks messages up into blocks (with padding at the end if necessary) and absorbs each message block in turn one at a time using repeated iterations of the same variable-length blockcipher. But, there is no possibility of collisions in the internal state (any two distinct messages will generate distinct internal states). Of course, the internal state will balloon to the size of the message once it is done absorbing. But that is the price of collisionless internal states. For this reason, I propose "uncompressible" rather than "non-iterative".

J.D.
  • 4,445
  • 16
  • 21
  • 1
    I agree that non-iterative is a bad word to use for this property. What do you think about non-streaming compression? I really like where you are going with this however I'm concerned about the following situation:

    Let $G(x) = md5(x)||x$ and $H(x)=md5(x)$, wouldn't this allow me to define $md5(x)$ as "non-iterative/non-compressing"?

    – Ethan Heilman Apr 14 '16 at 21:01
  • 1
    @EthanHeilman - indeed I think it would. While md5 isn't usually defined as a truncation of $G(x)$ like that, defining it that way is functionally equivalent to the usual definition. Clearly my definition for non-iterative is insufficient, and I am not sure at this moment how to fix it (or even if it can be fixed). Even a simpler definition like "a hash function where the internal state does not lose entropy until the final truncation step" would call your md5 example non-iterative. Frankly I'm at a loss, but hopefully this exercise has helped you somewhat. – J.D. Apr 14 '16 at 23:33