Since there are multiple solutions to the problem, I wanted to compare their relative performance because the source data is hundreds of megabytes. @Richard does not state whether the resulting 16-bit value is signed or unsigned. I have assumed unsigned, as did @creidhne and @josh in their answers and @Daniel Lichtblau in a comment. I also assumed the data file was effectively stored as BigEndian. I adapted @creidhne's example as follows.
Setup by creating a timing function and a 2 MB test file of 1,000,000 byte pairs. For the timing function I chose RepeatedTiming with a 5 second evaluation period. After many trial runs with different periods, I determined that 5 seconds was sufficient to provide repeatedly consistent results.
ClearAll[myRepeatedTiming]
SetAttributes[myRepeatedTiming, HoldFirst];
myRepeatedTiming[exp_] := RepeatedTiming[exp, 5]
file = FileNameJoin[{$TemporaryDirectory, "8,16-bit-compare"}];
BinaryWrite[file, RandomInteger[{0, 65535}, 1000000],
"UnsignedInteger16"];
Close[file];
Then time each of four different methods on processing the test data file:
- From the answer by @creidhne: Read list directly as 16-bit unsigned integers.
- From the comment by @Daniel Lichtblau: Read list as 8-bit unsigned integers. Process in pairs, multiplying high byte by 256 and adding low byte.
- From the answer by @josh: Read list as 8-bit unsigned integers. Process in pairs, converting each byte to a list of binary digits, joining the lists—high byte first—and converting to integer.
- As suggested in original post by @Richard: Read list as 8-bit unsigned integers. Process in pairs, left shifting high byte by 8 bits and adding low byte.
I then ran timings for each and displayed the results as average run time for the method and its ratio to the timing of the fastest. The display also compares results from the four methods to show they are identical.
{time16, result16} =
myRepeatedTiming[
BinaryReadList[file, "UnsignedInteger16", ByteOrdering -> 1]];
{timeMultiply, resultMultiply} =
myRepeatedTiming[
BlockMap[256*First@# + Last@# &, BinaryReadList[file], 2]];
{timeBits, resultBits} =
myRepeatedTiming[
BlockMap[
FromDigits[
Join[IntegerDigits[First@#, 2], IntegerDigits[Last@#, 2, 8]],
2] &, BinaryReadList[file], 2]];
{timeShift, resultShift} =
myRepeatedTiming[
BlockMap[BitShiftLeft[First@#, 8] + Last@# &, BinaryReadList[file],
2]];
Grid[{
{, Item["Time\n(sec)", Alignment -> Center],
Item[" Time\nratios", Alignment -> Center]}
, {"16-bit:", time16, 1}
, {"8-bit multiply:", timeMultiply, timeMultiply/time16}
, {"IntegerDigits:", timeBits, timeBits/time16}
, {"8-bit shift:", timeShift, timeShift/time16}
, {}
, {"All equal?",
result16 == resultShift == resultMultiply == resultBits}
}
, Alignment -> {{Right, Left, "."}}
, Spacings -> 2
]
DeleteFile[file];
Timing results were:
Time Time
(sec) ratios
16-bit: 0.0045457 1
8-bit multiply: 0.201478 44.3227
IntegerDigits: 0.608403 133.841
8-bit shift: 0.611216 134.46
All equal? True
It is unsurprising that the direct 16-bit integer file read was the fastest, because it requires no post-processing. I was however quite surprised to see that the 8-bit shift was so much slower than the 8-bit multiply. I was equally surprised that the conversion to binary digits and back was no slower than the 8-bit shift.
aandband the expected output? – Domen Sep 28 '23 at 12:33combineintegers? Thanks. – Syed Sep 28 '23 at 12:40Out[129]= 256`
– Daniel Lichtblau Sep 28 '23 at 15:07