8

I wanted a nice way to convert natural numbers to alphabetic representations like:

1->"A",
2->"B",
...
26->"Z",
27->"AA,
...
26*2->"AZ",
26*2+1->"BA",
...

And I knew this was mostly a conversion base 26 problem, except with the complication of lacking a 0

So things like:

In[412]:= IntegerDigits[26, 26]

Out[412]= {1, 0}

Would cause difficulties.

I solved this by unspooling via ReplaceRepeated where the sequence pattern {n_,0} maps to {n-1,26}, because, by the way digit representations are defined, {n,0} is equivalent to {n-1,0} and by having this cascade up / down the chain we get the right answer.

Here's the actual code for this:

intToAlpha[i_Integer?Positive, 
   alphabet : {__String} | Automatic : Automatic] :=

  With[{alpha = Replace[alphabet, Automatic :> Alphabet[]]},
   ToUpperCase@StringJoin@
     Part[alpha,
      DeleteCases[0]@
       ReplaceRepeated[IntegerDigits[i, Length@alpha],
        {s___, n_?Positive, 0, e___} :>
         {s, n - 1, Length@alpha, e}
        ]
      ]
   ];
intToAlpha[i : {__Integer?Positive}, 
   alphabet : {__String} | Automatic : Automatic] :=

  With[{a = Replace[alphabet, Automatic :> Alphabet[]]},
   intToAlpha[#, a] & /@ i
   ];

And this is decent all told:

In[433]:= AssociationMap[
 intToAlpha,
 RandomInteger[{1, 26*26*2}, 100]
 ]

Out[433]= <|1112 -> "APT", 907 -> "AHW", 870 -> "AGL", 938 -> "AJB", 
 1256 -> "AVH", 991 -> "ALC", 25 -> "Y", 203 -> "GU", 433 -> "PQ", 
 994 -> "ALF", 480 -> "RL", 762 -> "ACH", 576 -> "VD", 570 -> "UX", 
 931 -> "AIU", 1090 -> "AOX", 1237 -> "AUO", 404 -> "ON", 695 -> "ZS",
  1180 -> "ASJ", 580 -> "VH", 1040 -> "AMZ", 198 -> "GP", 218 -> "HJ",
  964 -> "AKB", 667 -> "YQ", 1135 -> "AQQ", 1285 -> "AWK", 
 763 -> "ACI", 825 -> "AES", 588 -> "VP", 841 -> "AFI", 1036 -> "AMV",
  1268 -> "AVT", 592 -> "VT", 742 -> "ABN", 118 -> "DN", 599 -> "WA", 
 795 -> "ADO", 119 -> "DO", 640 -> "XP", 809 -> "AEC", 213 -> "HE", 
 289 -> "KC", 1293 -> "AWS", 51 -> "AY", 829 -> "AEW", 37 -> "AK", 
 491 -> "RW", 1340 -> "AYN", 521 -> "TA", 55 -> "BC", 895 -> "AHK", 
 1211 -> "ATO", 1130 -> "AQL", 498 -> "SD", 1038 -> "AMX", 
 753 -> "ABY", 1191 -> "ASU", 542 -> "TV", 92 -> "CN", 168 -> "FL", 
 949 -> "AJM", 317 -> "LE", 354 -> "MP", 1141 -> "AQW", 1310 -> "AXJ",
  857 -> "AFY", 904 -> "AHT", 645 -> "XU", 1065 -> "ANY", 324 -> "LL",
  684 -> "ZH", 903 -> "AHS", 679 -> "ZC", 90 -> "CL", 1101 -> "API", 
 427 -> "PK", 844 -> "AFL", 162 -> "FF", 159 -> "FC", 559 -> "UM", 
 398 -> "OH", 860 -> "AGB", 1216 -> "ATT", 871 -> "AGM", 671 -> "YU", 
 285 -> "JY", 389 -> "NY", 499 -> "SE", 889 -> "AHE", 67 -> "BO", 
 448 -> "QF", 211 -> "HC", 836 -> "AFD", 808 -> "AEB"|>

In[434]:= 
intToAlpha[Range[100000]] // RepeatedTiming // First[#]/100000 &

Out[434]= 0.0000170

But I feel like there should be a more elegant (and likely faster) way, no?

Is there a way to do this directly, rather than converting via IntegerDigits then converting out the 0s? Is there some built-in function I'm missing?

b3m2a1
  • 46,870
  • 3
  • 92
  • 239

3 Answers3

7

I think you can use IntegerDigits directly if you add an offset, and then use the 3rd argument of IntegerDigits to only return the needed digits. For instance, consider base 3. If we use an offset (in trinary notation) of:

$$11 \text{$\cdots $1112}$$

Then the trinary notation for the offset integers $1, 2, \text{$\dots $, 13}$ is:

$$1120, 1121, 1122, 1200, 1201, 1202, 1210, 1211, 1212, 1220, 1221, 1222, \ 2000$$

Notice that if we take the last digit of the first 3, the last two digits of the next 9, and then the last 3 digits of the last number, we get:

$$0, 1, 2, 00, 01, 02, 10, 11, 12, 20, 21, 22, 000$$

With the replacements {0->"a", 1->"b", 2->"c"} we get the desired alphabet representation. The following code uses this idea:

intToAlphabet[i_, alphabet:{__String}|Automatic:Automatic]:=Module[{a = Replace[alphabet,Automatic:>ToUpperCase@Alphabet[]],base,maxDigits,offset,indices},
    base=Length[a];
    maxDigits = Floor[Log[base,(base-1)Max[i]+1.1]];
    offset=(1+(base-2)(base^maxDigits-1)/(base-1)) ;
    indices=1+IntegerDigits[i+offset, base, Floor[Log[base,(base-1)i+1.1]]];
    If[ListQ@i,
        StringJoin[a[[#]]]&/@indices,
        StringJoin[a[[i]]]
    ]
]

One comment on the code. Computing Floor[{Log[3,5], Log[3,15], ..}] is slow while the almost equivalent Floor[{Log[3, 5`], Log[3, 15`], ..}] will be very fast. The only issue with using real numbers instead of integers is to make sure something like Floor[Log[3, 9`]] evaluates to 2 and not 1 due to precision issues. I take care of this by adding a bit of slop.

At any rate, some examples:

intToAlphabet[Range[20], {"a","b"}]

{"a", "b", "aa", "ab", "ba", "bb", "aaa", "aab", "aba", "abb", "baa", "bab", "bba", "bbb", "aaaa", "aaab", "aaba", "aabb", "abaa", "abab"}

r = intToAlphabet[Range[10^5]]; //AbsoluteTiming
r[[{285, 448, 211}]]

{0.259129, Null}

in agreement with a few examples in your question.

{"JY", "QF", "HC"}

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
  • Overall very nice. I had thought about trying to add an offset before, but couldn't quite get it to work. One issue is that, working with Alphabet[] it doesn't seem to handle the transition from 1 digit to 2 properly (and there are some oddities further up in the 2-digit sequences). E.g. {intToAlphabet[{26, 27}]} gives {{"Z", "CA"}} and {intToAlphabet[{702}], intToAlphabet[{703}]} gives {{"BZ"}, {"CAA"}}. – b3m2a1 Jun 16 '17 at 14:42
  • @b3m2a1 I was tweaking things at the end, and introduced a bug. The problem is that the offset doesn't have enough digits. I think my edit should fix this. – Carl Woll Jun 16 '17 at 15:57
  • That seems to do the trick. It's a lot faster than my implementation and just as flexible. Very nice. – b3m2a1 Jun 16 '17 at 16:24
5

A memoized recursive approach seems faster:

dictionary = AssociationThread[Range[26] -> CharacterRange["A", "Z"]]

Clear[replace]
replace[n_Integer /; n <= 26] := replace[n] = dictionary[n]
replace[n_Integer] := replace[n] = Module[{quot, rem},
   {quot, rem} = QuotientRemainder[n, 26];
   If[rem == 0, quot = quot - 1; rem = 26];
   If[quot > 0, replace[quot] <> replace[rem], replace[rem]]
 ]

Comparative timing tests:

nums = RandomInteger[{10000000, 20000000}, 100000];

RepeatedTiming[replace /@ nums;]
(* Out: {0.0944, Null} *)

RepeatedTiming[intToAlpha@nums;]
(* Out: {2.34, Null} *)
MarcoB
  • 67,153
  • 18
  • 91
  • 189
  • Very nice. I know this is somewhat auxiliary to what I asked (but still, I think, relevant), but how would you memoize this for an arbitrary Alphabet? I'm thinking use a Block and memoize via an Association and use Lookup instead of a function call. The memoization there isn't permanent, but it should still be fast. – b3m2a1 Jun 15 '17 at 19:49
  • @b3m2a1 The alphabet is coded into dictionary and independent of the memoization. You could just hand-code a different one as an explicit association, e.g. <|1 -> "!", 2 -> "D", ... |> etc. I wonder if I misunderstand your question though. – MarcoB Jun 15 '17 at 19:52
  • I meant, consider some alphabet {a,b,c, ..., } and you want to perform the same sort of translation. Your memoization on replace would use Alphabet[], rather than {a,b,c, ...}. But it's not so bad using a Block, a memo association, and localizing dictionary. – b3m2a1 Jun 15 '17 at 19:56
  • One concern, I guess, is that if you don't use the same nums each time you lose a lot of the performance gain. For example: replace /@ RandomInteger[{10000000, 20000000}, 100000] // RepeatedTiming is about 1.4 seconds for me whereas intToAlpha@ RandomInteger[{10000000, 20000000}, 100000] // RepeatedTiming is 1.8 seconds (and slightly more flexible). If we generalize replace to arbitrary dicts it's then not even faster than the ReplaceRepeated method. – b3m2a1 Jun 15 '17 at 20:14
3

Mod and Quotient support offsets, allowing:

core = Quotient[Sow @ Mod[#, 26, 1]; #, 26, 1] &;

fn[n_Integer?Positive] := (
  NestWhile[core, n, Positive]
    // Reap
    // Extract[{2, 1}]
    // FromCharacterCode[# + 64] &
    // StringReverse
 )

Test:

Accumulate[26^Range[4]]

fn /@ %

fn /@ (%% + 1)
{26, 702, 18278, 475254}

{"Z", "ZZ", "ZZZ", "ZZZZ"}

{"AA", "AAA", "AAAA", "AAAAA"}

I don't expect this to be competitively fast as written but the algorithm should be compilable.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Nice. I would be interested in a speed comparison of my algorithm and a compiled version of yours. – Carl Woll Jun 18 '17 at 16:01
  • @CarlWoll If you have a C compiler installed please consider trying it. I tried to install a compiler twice and had trouble both times, so I just sort of gave up. Maybe start from Compile[{{n, _Integer}}, Rest@NestWhileList[{Mod[#[[2]], 5, 1], Quotient[#[[2]], 5, 1]} &, {0, n}, #[[2]] > 0 &][[All, 1]], {{_NestWhileList, _Integer, 2}} ] – Mr.Wizard Jun 18 '17 at 16:18