Faster binary Hamming weight for big integers?

Question

While working on an answer to Count the sequences in an array I found that DigitCount was the bottleneck in my code when used as DigitCount[num, 2, 1]. DigitCount first expands the number to an explicit list of digits and then Tallys them. This is of course quite inefficient.

A minor improvement can be had by simply summing the IntegerDigits but that still wastefully expands the size of the expression more than sixty times.

Is there a faster method to perform this operation on big integers?

A less performance oriented related question: Sum over binary digits of integer

Examples:

num = RandomInteger[10^(3*^6)];
list = IntegerDigits[num, 2];

ByteCount /@ {num, list}

{1245800, 79726416}

DigitCount[num, 2, 1]      // RepeatedTiming
Tr @ IntegerDigits[num, 2] // RepeatedTiming

{0.0417, 4982222}
{0.028, 4982222}

An unrelated bit-level operation is two orders of magnitude faster:

BitShiftLeft[num]; // RepeatedTiming

{0.000264, Null}

Some possibly useful methods here: http://graphics.stanford.edu/~seander/bithacks.html -- I haven't tried to implement any of them yet. — Mr.Wizard, Jul 03 '15 at 22:41
@Oleksandr Thanks for your work on this. BitShiftLeft is probably an unfair comparison. Should I remove that from this question? — Mr.Wizard, Jul 03 '15 at 22:47
No; I take my comment back. It is the large number of times that BitAnd is called, not BitAnd itself, that leads to slowness. BitAnd by itself works as fast as BitShiftLeft. This will probably be a hazard for most of the bit-manipulation methods you reference. — Oleksandr R., Jul 03 '15 at 22:51
This is one of those cases where if speed were critical, I'd go ahead an do a call-out to C/Lisp/etc. where they have built-in population count for bits... +1 on interesting question, I look forward to responses. — ciao, Jul 03 '15 at 23:03
@ciao Any problems handling massive integers with that approach? I suppose there are a lot of bignum libraries now but many years ago it didn't seem that simple. Or maybe it was just my lack of skill. — Mr.Wizard, Jul 03 '15 at 23:28
@Mr.Wizard: Nah, it's gotten pretty good - my Lisp environments handle bignum faster than MMA, and c/c++/c# all have libraries that are quite good, including e.g. bitarrays of huge cardinality. — ciao, Jul 03 '15 at 23:33
IntegerDigits[num, 2] // Total has about the same performance as IntegerDigits[num, 2] // Tr — m_goldberg, Jul 04 '15 at 01:11
@m_goldberg In older versions (e.g. 7) Tr is faster than Total on packed arrays. In current versions it's still more terse. ;-) — Mr.Wizard, Jul 04 '15 at 01:16
Saving three characters when typing is always important, right? Or at least an offering to the gods of tersity (sic), concisity (sic), and laconicity (sic) :-) — m_goldberg, Jul 04 '15 at 01:31

score 22 · Accepted Answer · edited Apr 13 '17 at 12:55

22

We can take advantage of the fact that IntegerDigits is very fast when the base is large. But not too large: no bigger than $2^{63}-1$ on a 64-bit system or $2^{31}-1$ on a 32-bit one, because Mathematica's machine integers are signed. Additionally, non-power-of-two bases require more work to get the result than just partitioning a bit-string, and are correspondingly slower. So, we choose the greatest allowable power of two, i.e. $2^{62}$. (Here we assume a 64-bit-capable computer.)

We also take advantage of the POPCNT x86 instruction and its implementation as a compiler builtin. A simplified version of this answer provides the necessary LibraryLink function:

#include "WolframLibrary.h"

DLLEXPORT
mint WolframLibrary_getVersion() {
    return WolframLibraryVersion;
}

DLLEXPORT
int WolframLibrary_initialize(WolframLibraryData libData) {
    return 0;
}

DLLEXPORT
void WolframLibrary_uninitialize() {
    return;
}

DLLEXPORT
int hammingWeight_T_I(WolframLibraryData libData,
                        mint argc, MArgument *args,
                        MArgument res) {
    MTensor in;
    const mint *dims;
    mint *indata, i, total;

    in = MArgument_getMTensor(args[0]);
    if (libData->MTensor_getRank(in) != 1) return LIBRARY_DIMENSION_ERROR;
    if (libData->MTensor_getType(in) != MType_Integer) return LIBRARY_TYPE_ERROR;
    dims = libData->MTensor_getDimensions(in);
    indata = libData->MTensor_getIntegerData(in);

    total = 0;
    #pragma omp parallel for schedule(static) reduction(+:total)
    for (i = 0; i < dims[0]; i++) {
        total += (mint)__builtin_popcountll( (unsigned long long)indata[i] );
    }

    MArgument_setInteger(res, total);
    return LIBRARY_NO_ERROR;
}

This function takes a list of integers and produces the total of their Hamming weights, using OpenMP for parallelization. Here we use __builtin_popcountll, which is a GCC builtin, but other compilers have their own equivalents, such as __popcnt64 for Microsoft C++. If you use a compiler other than GCC, you can substitute the appropriate function.

Compile it:

gcc -Wall -fopenmp -O3 -march=native -I. -shared -o hammingWeight.dll hammingWeight.c

(Here you should write the correct include path for WolframLibrary.h.)

Now we can define our function:

hammingWeightC = LibraryFunctionLoad[
  "hammingWeight.dll",
  "hammingWeight_T_I", {{Integer, 1, "Constant"}}, {Integer, 0, Automatic}
  ];
hammingWeight[num_Integer] := hammingWeightC@IntegerDigits[num, 2^62];

Let's create an obnoxiously large integer to test it with:

num = RandomInteger[10^(5*^7)];
hammingWeight[num] === Tr@IntegerDigits[num, 2] (* -> True *)

So, it works. How does it do for speed?

AbsoluteTiming[
 Do[Tr@IntegerDigits[num, 2], {10}]
] (* -> 11.594 seconds *)

AbsoluteTiming[
 Do[hammingWeight[num], {10}]
] (* -> 0.297 seconds *)

As we see, on my computer, it is about 40 times faster than the next best approach. 85% of the runtime is accounted for by IntegerDigits rather than the calculation of the Hamming weight itself, so probably it will be more or less the same on other computers as well.

N.B.: This same IntegerDigits method can also be adapted to the linked question, thus providing the solution for how to quickly calculate the Hamming distance of big integers, given the already elaborated method for machine integers.

edited Apr 13 '17 at 12:55

Community

1

answered Jul 04 '15 at 00:41

Oleksandr R.

23,023
4
87
125

You've got my +1 on faith, but before I Accept this I really should test it. For that I need to get GCC installed and working. Can you give me an outline of steps to follow to attempt that? – Mr.Wizard Jul 04 '15 at 00:53
@Mr.Wizard well, shall I just send you the DLL? Here it is. (Virus checked with Virustotal, but exercise your own caution.) Compiled for a Core 2 processor, but should work on anything more recent. (I can help you with GCC another time, if you don't mind. It is quite a complex process.) – Oleksandr R. Jul 04 '15 at 00:58
Thanks! And of course I don't mind; that you're willing at all is appreciated. – Mr.Wizard Jul 04 '15 at 01:00
@Mr.Wizard I will be interested to hear if the DLL runs and what its comparative speed is like on your computer. – Oleksandr R. Jul 04 '15 at 01:04
I gave it a try but "The program can't start because libgomp_64-1.dll is missing from your computer." -- help? – Mr.Wizard Jul 04 '15 at 01:07
I'm sorry about that. I hope this one will work better. Please let me know whether it does or not. I will need to delete this soon because it is not really legitimate to statically link GNU libgomp (which is GPLv3) into a DLL and then distribute it. – Oleksandr R. Jul 04 '15 at 01:15
That one gives me: The version number 2 of the library is not consistent with the current or any previous WolframLibraryVersion. >> and A nonzero error code 7 was returned during the initialization of the library hammingWeight.dll. – Mr.Wizard Jul 04 '15 at 01:17
@Mr.Wizard I do not understand that error at all. WolframLibraryVersion 2 is Mathematica 9. I can only assume that there is some problem in Mathematica 10 with backward-compatibility of LibraryLink libraries? Unfortunately, there is not much I can do about it for now. Are you free in the chat to talk about GCC? – Oleksandr R. Jul 04 '15 at 01:22
@Oleksandr Sorry, I wasn't around for awhile and I am only briefly here now. But thank you. Fortunately I see that this answer isn't going unnoticed. :-) – Mr.Wizard Jul 04 '15 at 03:57
1

I guess I should add this note: this can be used to compute the Thue-Morse and Rudin-Shapiro sequences that are built-in in newer versions, but are apparently not too fast. – J. M.'s missing motivation Jul 28 '15 at 12:40
@J.M. it might be nice to add that as an answer, if you feel like it. I suppose perhaps you haven't access to a compiler, but if not, I can add timings. – Oleksandr R. Jul 28 '15 at 14:17
1

Unfortunately, I don't have a computer right now. But Jacob and me discussed this in chat; anyway, if you want to run your own tests, here are the relevant identities: ThueMorse[n] == Mod[hammingWeight[n], 2] and RudinShapiro[n] == 1 - 2 ThueMorse[BitAnd[n, Quotient[n, 2]]]. – J. M.'s missing motivation Jul 28 '15 at 14:27
@J.M. Another not-too-fast built-in; surprise, surprise. :-p – Mr.Wizard Jul 28 '15 at 16:34
1

Why the code cannot be compiled normally?Needs["CCompilerDriver`"] myLibrary = CreateLibrary[{"C:\\Users\\Shutao TANG\\Desktop\\hammingWeight.c"}, "hammingWeight", "Debug" -> False], which gives me some errors – xyz Jun 02 '16 at 04:34
2

@ShutaoTANG sorry, I can't read Chinese represented in an incorrect code page, so I can't tell what the errors say. However, it seems that your problem is using Microsoft C++ to compile code that uses GCC builtins (because I used GCC to test this). If you add a reference to intrin.h and change __builtin_popcountll to __popcnt64, as I stated in the answer, it should hopefully work. But I haven't tested that myself. – Oleksandr R. Jun 02 '16 at 12:25
@ShutaoTANG please also see this comment by @s0rce. – Oleksandr R. Jun 02 '16 at 12:47
@OleksandrR. Thanks a lot:) Yes, the compiler that I used is Visual Studio 2012. In addtion, the error part that with a red underline are not Chinese characters, I think it is garbled . – xyz Jun 02 '16 at 13:12
1

@ShutaoTANG yes, clearly the characters are not correctly written. But I think they come from the representation of Chinese characters as a sequence of bytes, interpreted in the incorrect code page? For example, if I go to baidu.com and then switch my browser code page to Windows-1252, I get very similar-looking garbled characters. – Oleksandr R. Jun 02 '16 at 13:34
@OleksandrR. Anyway, I discovered that LibraryLink technique is very useful and I should study it. – xyz Jun 02 '16 at 14:26

Faster binary Hamming weight for big integers?

1 Answers1

Linked