No builtin function for bitwise rotation?

Question

There appears to be no builtin Mathematica function for bitwise rotation. Is that true?

I suppose I can write my function:

ROR[x_, k_] := FromDigits[RotateRight[IntegerDigits[x, 2, 32], k]]

Obviously this will be a lot more inefficient than a builtin function because in machine code a rotate right is a single instruction. Is this my only option?

is a complete rotate right a single instruction in machine code? Is that correct? A bit-shift right or left is a single instruction, but for a rotate right you have to place the LSB back as the MSB, which I believe must be more than a single instruction. I believe only a bit shift is a single instruction, not a rotate. — Andreas Lauschke, Oct 30 '13 at 16:55
On x86 architectures rotate right (ROR) is a single machine instruction. — Tyler Durden, Oct 30 '13 at 17:13
Just remember that the "single instruction" debate only makes sense for machine integers — Rojo, Oct 30 '13 at 17:40
ROR is a single machine instruction but not always a single operation. Pentium 4, which did not have a full width barrel shifter, broke this into a number of smaller rotations. Be careful when translating x86 instructions into actual operations, and even more so where clock cycles are concerned due to pipelining and superscalar execution. — Oleksandr R., Nov 04 '13 at 15:20

Daniel Lichtblau · Accepted Answer · 2013-11-04T18:27:40.177

This should be reasonably efficient.

BitRotateRight[n_Integer, m_Integer] /; n > 0 && m >= 0 :=
  BitShiftRight[n, m] + BitShiftLeft[Mod[n, 2^m], (BitLength[n] - m)]

Will need to modify for left rotations if you want to support negative m. I added the restriction that n be positive. Not exactly sure what should be the behavior for negative n.

Quick example:

BitRotateRight[111, 5]

(* Out[2]= 63 *)

IntegerDigits[111, 2]

(* Out[3]= {1, 1, 0, 1, 1, 1, 1} *)

--- Edit ---

The question has been raised as to whether this approach is correct, or efficient for machine integers. Having looked more closely at the original post I agree it is not quite correct. The thing to do is to use a fixed length of e.g. 32 bits. I will show variants on this theme below. I hard-coded that lenght of 32 in several places. Obviously it could instead be an input parameter.

This is my first try, suitably modified, and made Listable for efficiency testing.

SetAttributes[BitRotateRight, Listable];

BitRotateRight[n_Integer, m_Integer] /; n > 0 && m >= 0 := 
 BitShiftRight[n, m] + BitShiftLeft[Mod[n, 2^m], (32 - m)]

Here is a compiled version.

BitRotateRightC = 
  Compile[{{n, _Integer}, {m, _Integer}}, 
   BitShiftRight[n, m] + BitShiftLeft[Mod[n, 2^m], (32 - m)], 
   RuntimeAttributes -> Listable];

Here is a variant that is not Listable but instead operates directly on a rank 1 tensor.

BitRotateRightC2 = 
  Compile[{{n, _Integer, 1}, {m, _Integer}}, 
   BitShiftRight[n, m] + BitShiftLeft[Mod[n, 2^m], (32 - m)]];

Last we'll show the code from the original post. I alter so that it computes, rather than shows, the machine integer result. Specifically, we use FromDigits to base 2 and do not try to show it as a bit string.

SetAttributes[ROR, Listable]
ROR[x_, k_] := FromDigits[RotateRight[IntegerDigits[x, 2, 32], k], 2]

We'll show testing on a list of a million or so machine integers.

n = 2^20;
SeedRandom[1111];
tt = RandomInteger[{0, 2^30}, n];

Timing[uu1 = BitRotateRight[tt, 3];]
(* Out[130]= {4.912000, Null} *)

Timing[uu2 = BitRotateRightC[tt, 3];]
(* Out[146]= {0.808000, Null} *)

Timing[uu2b = BitRotateRightC2[tt, 3];]
(* Out[147]= {0.044000, Null} *)

Timing[uu3 = ROR[tt, 3];]
(* Out[148]= {7.336000, Null} *)

I also did a version of ROR using Compile. It took about the same time as BitRotateRight. Also if I remove the positivity checks from BitRotateRight then it becomes somewhat faster (takes 3.2 seconds). So ROR is clearly slower, but not by all that much, than my first attempt.

Let's check for agreement.

uu1 = uu2 == uu2b == uu3
(* Out[149]= True *)

The middle two are of course packed array results. The outer ones are not. Compiling gives a factor of 5 improvement in speed, and working directly on a rank 1 tensor is an order of magnitude faster still.

--- End edit ---

--- Edit 2 ---

Actually it is best just to use basic listability, provided one is willing to forego type checking or do it differently than I had done. Suppose we clear prior definitions and then start with the one below.

BitRotateRight[n_, m_Integer] := 
 BitShiftRight[n, m] + BitShiftLeft[Mod[n, 2^m], (32 - m)]

Here is that example with 2^20 elements.

Timing[uu1b = BitRotateRight[tt, 3];]
(* Out[192]= {0.048000, Null} *)

In[193]:= uu1b == uu2
(* True *)

Probably I should just have done it this way from the beginning. That said, using Compile and showing runs that give no error or warning messages had its own advantage. It aptly demonstrates that the standard Mathematica evaluator is not being used. That is to say, the computation stays in the VM and runtime library. It was this secondary goal that (mis?)lead me in the compilation direction.

--- End edit 2 ---

I find it interesting that compilation sped up BitRotateRightC2 as much as it did given the two calls out of the VM for the bit shift functions. Relative to the same thing done without compilation (given that all of the elementary operations are Listable) I still observe an improvement of around 2.5 times. — Oleksandr R., Nov 04 '13 at 18:17
@OleksandrR. Funny, I was just realizing that I should have used listability better. Will show in a new edit. — Daniel Lichtblau, Nov 04 '13 at 18:20
You could add a basic sort of type checking by changing the pattern n_ to be n:_Integer|{__Integer}. By the way, if I understand correctly, that no message is produced does not imply that there is no call out of the VM: opcode 47 signifies just such a call, but provided that the compiler knows in advance that the call is necessary and what type will be returned, there is no need for a message to call attention to it. I believe this is the case here. In contrast, some functions (like Total) compile to opcode 42, which is a direct call to an RTL function from within the VM. — Oleksandr R., Nov 04 '13 at 19:19
@OleksandrR. Yes you understand correctly. One type of blessed call outside is to the runtime library, and that's what the code I showed will use. Another is for functions that are explicitly declared to Compile as external, but I didn't use any such. Total and BitShiftRight both appear to be handled in the RTL but, best I can tell, in different parts thereof. That accounts for the differences between them in terms of byte code. — Daniel Lichtblau, Nov 04 '13 at 19:42
Very interesting. I didn't know that this kind of call was possible, but I'm convinced after seeing that there's nothing in Internal\CompileValuesand nor do the calls propagate up to the top level. Both this and the explicit external call are signified by "MainEvaluate" in theCompiledFunctionTools`` package, which makes it difficult to distinguish them without looking at the bytecode. — Oleksandr R., Nov 04 '13 at 19:52
@OleksandrR. Unfortunately it's maybe more complicated than I had stated. BitShiftLeft will get into the main evaluator and then be dispatched to fast code in the RTL. It appears that things with opcode 47 are handled probably more efficiently than those with opcode 46, and maybe less so than those with opcode 42. Okay, I'm thoroughly confused. — Daniel Lichtblau, Nov 04 '13 at 20:09
Contrary to what the CompiledFunctionTools\`` package claims, I think you were right the first time. I tried hookingBitShiftLeft` to print out every call, and since the hook never fired, it would appear that the call from within the VM never actually hit the main evaluator, although maybe it received some kind of preprocessing (RTL function lookup?). Do you know if there's any list of functions that can be accommodated by the opcode 47 mechanism, or some criterion for identifying them? Given the performance benefits it affords, the compiler may be more widely applicable than I had thought. — Oleksandr R., Nov 04 '13 at 20:37
@OleksandrR. (1) You are correct that the main evaluator is bypassed. It goes directly to the highest level handling of BitShiftXXX, though, and only from there does it get dispatched to low level code. So this could be viewed as a "partial bypass" of the evaluator. (2) I do not know if such a list exists. I think one can tell whether one's code will go this route based on whether it is a Function call from the VM or a simple opcode 47 Evaluate. But I do not know of any list for which get that latter treatment. — Daniel Lichtblau, Nov 04 '13 at 21:27
@OleksandrR. At this point I've spent more of my day on that integral than on bitwise rotation. It's a bug deep in some MeijerG convolution code involving a miss-simplification. I think it will get fixed although that comes with no guarantee nothing else will break. But this integral is more important than the type that the bug-causing change was intended to fix. — Daniel Lichtblau, Nov 04 '13 at 23:03

score 3 · Answer 2 · answered Nov 02 '13 at 20:30

3

I may be misunderstanding something, but I think Daniel's proposed solution is wrong. I present my own below.

Regarding Daniel's BitShiftRight, the output doesn't match the output of the o/p's code, in that for example BitRotateRight[111,2] doesn't match the output of ROR[111,2]. Second, the o/p is deliberately constructing a list of 32 elements (see the expression IntegerDigits[x, 2, 32] in his code) before applying the rotation (it's not necessarily 32 digits afterwards). For example, ROR[111,2] is 11000000000000000000000000011011 that's a huge number, not 123.

Perhaps I'm fundamentally wrong and completely misunderstand the question and Daniel's proposed solution, because he already has 5 points for that, as of afternoon Nov 2.

Here is my own proposed solution. Note that it uses the function JavaCode from the M add-on package JVMTools:

JavaCode["final String rotaterightnerd(final int bits,final int k)
{
return java.lang.Integer.toBinaryString((bits >>> k) | (bits << (64-k)));
}"]

JavaCode["final String rotaterightbuiltin(final int bits,final int k)
{
return java.lang.Integer.toBinaryString(java.lang.Integer.rotateRight(bits,k));
}"]

randomlist=RandomInteger[{0,100000000},10000];

Table[AbsoluteTiming[l2 = rotaterightnerd[#, i] & /@ randomlist;], {i,8}]

Table[AbsoluteTiming[l3=rotaterightbuiltin[#,i]&/@randomlist;],{i,8}]

l2===l3

The nerd solution and the built-in solution produce the same output, and the timing results are about the same. And as to the claim of "reasonably efficient", I can't really benchmark it against my own code, because, as I said above, I believe BitRotateRight to be wrong, and we don't get the bit digits that the o/p produces with his ROR function. I do, however, happen to know that BitShiftRight and BitShiftLeft in M internally are implemented with top-level functions, and that you get much faster performance if you rewrite them on your own with functions like Mod and Quotient, etc. And not only do you get faster functions if you "roll your own" with Mod and Quotient, but even that doesn't use actual bit-shifts, and hence my own solution that uses ACTUAL bit-shifts (which is about the fastest math operation you can get on a CPU -- even CUDA shouldn't be able to speed this up). And what further alienates me from any BitShift-based code is the fact that you cannot compile it, not into bytecode and not into C code. Mod and Quotient compile. (only tested on M9, but the same was true years ago in M7, didn't check M8).BitShiftRight and BitShiftLeft are two of those many functions in M that don't do what they say: they don't actually do a bit-shift, they just mimic such behavior with slow top-level functions.

Additional notes:

the 64 above is hardcoded. The general form would be

return (value >>> shift) | (value << (sizeof(value) * 8 - shift));
if you are not interested in the actual bit list of the number (because for most purposes you want the number simply to be in memory as an int or a long and not look at it), you can remove the java.lang.Integer.toBinaryString(...) part, further speeding up my code. I only included it to match the output of the o/p, but I don't think looking at the binary 0s and 1s is what we're really after. We generally want the bit shift/bit rotation of the number to be computed, not look at the binary 0s and 1s.

Disclosure: JVMTools is a commercial M package sold by Lauschke Consulting. I am the owner of Lauschke Consulting.

answered Nov 02 '13 at 20:30

Andreas Lauschke

4,009
22
20

I think perhaps the confusion here suggests a possible reason why there isn't a BitRotate function built in--because there are no fixed length integers in Mathematica (apart from machine integers, which seem to be considered more of an implementation detail than a first class numeric type), it's a little unclear how wide the bit field should be. Daniel's answer seems correct to me, if one assumes that any leading zeros in the argument are not significant. Otherwise, one needs to specify the effective number of leading zeros explicitly. – Oleksandr R. Nov 02 '13 at 21:34
1

I also couldn't confirm your statement that BitShift* are implemented at the top level. They do not have any downvalues and do not produce any subsidiary calls in a trace. Are you sure it still applies for current versions of Mathematica? Back in version 3, I know that bit manipulation was sometimes done using top-level functions, but I think (hope) that this is no longer the case. – Oleksandr R. Nov 02 '13 at 21:41
2

@OleksandrR. (and Andreas L), I can confirm that the Mathematica BitShiftXXX routines are coded in C, not Mathematica. That was work I did sometime during version 4 development. I believe they were extended to work on packed arrays when applied to lists, meaning Compile should handle them quite fast. I have not tested this of late. On a separate note, I confess I did not check that the OP was using a fixes integer size, so our functions are indeed different and mine well may be incorrect. Not too hard to fix though. – Daniel Lichtblau Nov 03 '13 at 21:10
@OleksandrR. and Daniel, a) "I think perhaps the confusion here ..." I can't see any confusion, if we take an o/p's question literally and not start interpreting liberally. We have to assume an o/p means what he/she says. This assumption may be wrong, but we have to make that assumption, lest we can't address specific problems. – Andreas Lauschke Nov 04 '13 at 00:18
b) "because there are no fixed length integers in Mathematica (apart from machine integers... " the o/p mentions machine code and single-instruction execution, therefore my thinking in this post is directed towards machine integers. The whole bitshift/bitrotation discussion has no point if we're talking about M's infinite precision numbers. They're blissfully marvellous, but that's not what the o/p is asking about. We have to assume machine numbers, otherwise violating the o/p's quest. – Andreas Lauschke Nov 04 '13 at 00:18
c) "Daniel's answer seems correct to me, if one assumes that any leading zeros in the argument are not significant." which seems an invalid assumption to me, in fact, actually violating the o/p's quest, who specifically asks about IntegerDigits[x, 2, 32] (note the 32 in the 3rd argument), – Andreas Lauschke Nov 04 '13 at 00:18
d) "Otherwise, one needs to specify the effective number of leading zeros explicitly" which the o/p did -- explicilty, with IntegerDigits[x, 2, 32] – Andreas Lauschke Nov 04 '13 at 00:19
e) "I also couldn't confirm your statement that BitShift* are implemented at the top level. They do not have any downvalues and do not produce any subsidiary calls in a trace." which doesn't contradict my claim that they are implemented with top-level, symbolic code, C code implementations notwithstanding. If you benchmark against "roll your own" implementations, they are much slower, and as you can also see easily by inspecting the 6th list element of a compiled expression (using M9, it's the 4th in 4 and 5 and 6 and 7) they don't compile (byte-code as well as C code). – Andreas Lauschke Nov 04 '13 at 00:20
f) "Are you sure it still applies for current versions of Mathematica?" yes. g) "Back in version 3, I know that bit manipulation was sometimes done using top-level functions, but I think (hope) that this is no longer the case." M3 was 1996, or thereabouts, thus pre-SpiceGirls-split-up and therefore doesn't count, and your hopes seem to be quashed by the fact that bit manipulation of the BitShift* functions even in M9 still don't happen on an actual bit shift level (as the symbol name would suggest), as Dan can probably confirm to you in private conversation, – Andreas Lauschke Nov 04 '13 at 00:20
h) "meaning Compile should handle them quite fast" well, the term "fast" is relative and vague without specification and comparisons, and they don't Compile, neither into bytecode, nor into C code, and that is my machine-integers objective here; and compilation should be a viable option if the o/p is seeking very fast performance as indicated by the words "inefficient" and "machine code". – Andreas Lauschke Nov 04 '13 at 00:21
NestListWhile is a good example of a symbol for which compilation does NOTHING to speed it up (because its top-level, symbolic nature is so incredibly good), but we can't say the same about the non-Compiled performance about BitShift*. – Andreas Lauschke Nov 04 '13 at 00:21
i) "I have not tested this of late" do it. j) "I confess I did not check that the OP was using a fixes integer size, so our functions are indeed different and mine well may be incorrect." I believe a pre-rotation list of 32 binary digits, as indicated by RotateRight[IntegerDigits[x, 2, 32], k] is indeed KEY here. – Andreas Lauschke Nov 04 '13 at 00:22
2

@AndreasLauschke the basic assumption of the question, as I see it, is that a builtin function would use machine instructions for the shift/rotate operation. Since machine quantities are not first class types in Mathematica and the assumption clearly does not hold for arbitrary length integers, I think answers can legitimately choose either to improve the performance of OP's implementation or to describe a different but more general operation. – Oleksandr R. Nov 04 '13 at 15:07
I am also not too sure what one can really infer from the fact that it is possible to write faster bit shift functions for special (restricted length) arguments. Just because the builtins are not as fast as the raw machine instructions, it does not imply that they are somehow top-level code, since one must always pay something for generality. If the objective is simply to use the machine instructions then one can always use LibraryLink to write one's own effectively built-in functions. – Oleksandr R. Nov 04 '13 at 15:13
3

The meaning of "top level code", at least in house, is "written in Mathematica" (as opposed to C). The BitShiftXXX functions are written in C. They do not produce bytecode because they are evaluated directly by dedicated C code in the runtime library (same as for e.g. LinearSolve). – Daniel Lichtblau Nov 04 '13 at 16:53

No builtin function for bitwise rotation?

2 Answers2

Linked