3

I'm solving a coding problem, and I wanted to mathematically prove that the answer exists. Essentially, we are given a string of lowercase letters, and we want to know if it can be rearranged such that no character repeats. Example: aab can be written aba, but aaab can't be rearranged in such a way.

Let $N$ be the length of the string, and let there be $n$ unique characters $\{c_1,\cdots, c_n\}$, and $f_m$ is the frequency of $c_m$ in the string. The solution exists if $\max\{f_m\}\leq \lceil N/2\rceil$. This part I'll take for granted -- I'm curious about a subproblem. Suppose I repeat each distinct character $k\leq \min\{f_m\}$ times and form a valid string. For instance, if the input string is aabbccdddd, and $k=1$, we would form a string abcd repeating each unique character once, and the remaining substring to rearrange would be abcddd. Is it still guaranteed that the remaining $N-km$ characters (abcdd) can be rearranged into a string that has no repeating characters? (For instance, abdcd).

Let $f^*=\max{\{f_m\}}$. The new string length is $N-km$. I want to prove, given $f^*\leq \lceil N/2\rceil$, the inequality $f^*-k\leq\lceil N/2 -mk/2 \rceil$ holds. From the first inequality, I have

$$ f^*-k \leq \lceil N/2\rceil-k, $$

so the second inequality is not necessarily true, no? Intuitively, it should be true and proveable, but I can't do it, and I'm confused. What am I missing?

J.-E. Pin
  • 40,163
sodiumnitrate
  • 623
  • 1
  • 5
  • 16
  • What do you mean by “repeat each distinct character $k$ times”? For example, with AAB what is the result when you repeat each distinct character 4 times? I am confused why the new string has length $N-km$; i thought the new string should be longer, not shorter. – Mike Earnest Feb 25 '24 at 19:08
  • "$m$ be the number of unique characters, and $f_m$ is the frequency of character $m$". Is "$m$" the number of unique characters, or is $m$ a character? FYI - there are more than 4 letters in the alphabet. You do not need to force poor $m$ to play multiple roles while other letters are languishing from lack of use. – Paul Sinclair Feb 26 '24 at 17:36
  • If there are $m$ characters, and each character is used $k$ times, then the string length is $N = km$, so your new string would have length $N - km = 0$, For this new string everything is $0$. – Paul Sinclair Feb 26 '24 at 17:41
  • @MikeEarnest See text for the clarifications. For the second point, I am placing $km$ out of $N$ available letters, and I have yet to arrange and place $N-km$ letters. – sodiumnitrate Feb 26 '24 at 18:26
  • What about abcddd? – aschepler Feb 26 '24 at 18:28
  • @PaulSinclair See edited post. I have $N$ characters I have to rearrange. I am placing/arranging $km$, so I have to arrange $N-km$ more. – sodiumnitrate Feb 26 '24 at 18:29
  • 1
    @aschepler yeah ok that's the counterexample my brain couldn't come up with yesterday. The thing I'm trying to prove is incorrect. Thanks. – sodiumnitrate Feb 26 '24 at 18:31
  • There is actually an explicit formula for counting the number of arrangements of a word with no two adjacent letters the same - see here - and it can be computed fairly fast. Then there is such an arrangement if and only this number is nonzero. But if all you want to know if such an arrangement exists, perhaps there is a faster way. – Jair Taylor Feb 26 '24 at 19:27
  • @aschepler, I encourage you to write that as an answer so we can upvote it. – D.W. Feb 26 '24 at 23:26

1 Answers1

1

Is it still guaranteed that the remaining $N-km$ characters (abcdd) can be arranged into a string that has no repeating characters?

No - if some character is a large enough proportion of the whole, what's left can have more of that character than half its length. For example, applying this algorithm to abcddd, we start with abcd but then dd is left over.

A summary of another algorithm which will work: Order the groups of letters from the most frequent to least frequent. Split this string in half, and interleave them in the odd and even positions in the final string. For example, abcdddd $\to$ dddd, abc $\to$ dadbdcd.

aschepler
  • 9,449