You can use your naive approach for 1D lists, so one idea is to convert your matrices to vectors, process the vectors using your naive approach, and then convert back to a matrix:
b = ArrayReshape[a, m n]; //AbsoluteTiming
newpos = {n, 1} . (Transpose[Developer`ToPackedArray @ pos] - {1, 0}); //AbsoluteTiming
e = b[[newpos]]; //AbsoluteTiming
b[[newpos]] = 0; //AbsoluteTiming
newa = ArrayReshape[b, {m, n}];//AbsoluteTiming
{0.110141, Null}
{0.014826, Null}
{1.03683, Null}
{1.03194, Null}
{0.067154, Null}
To do a comparison, I will truncate the length of pos to 1000:
pos = pos[[;;1000]];
b = ArrayReshape[a, m n]; //AbsoluteTiming
newpos = {n, 1} . (Transpose[Developer`ToPackedArray @ pos] - {1, 0}); //AbsoluteTiming
e = b[[newpos]]; //AbsoluteTiming
b[[newpos]] = 0; //AbsoluteTiming
newa = ArrayReshape[b, {m, n}];//AbsoluteTiming
{0.091493, Null}
{0.001057, Null}
{0.963493, Null}
{1.02483, Null}
{0.066962, Null}
It is interesting that the assignment steps take basically the same time, whether pos has a length of 1000 or 10^6. Your "slow" approach:
val = Table[{i,j}=e; v=a[[i,j]]; a[[i,j]]=0; v, {e, pos}];//AbsoluteTiming
{2.81587, Null}
Check:
a == newa
SparseArray[val] == e
True
True
So, the timing is comparable when pos has a length of ~1000. On the other hand, your approach for a length of ~10^6 will be about 1000 times slower.
Addendum
Another possibility is to create a mask, and then use the mask to zero out the positions:
masked = SparseArray[pos -> 0, {m, n}, 1] a; //AbsoluteTiming
masked == newa
{0.104856, Null}
True
It is also possible to use this approach to recreate the e vector from before:
index = SparseArray[pos->1, {m, n}]; //AbsoluteTiming
v = index a; //AbsoluteTiming
values = SparseArray[index (v - Min[v] + 1)]["NonzeroValues"] + Min[v] - 1; //AbsoluteTiming
newe = values[[Ordering @ Ordering @ pos]]; //AbsoluteTiming
newe == e
{0.030043, Null}
{0.008961, Null}
{0.06608, Null}
{0.323891, Null}
True
e =b[[newpos]]andb[[newpos]]=0are way too memory-inefficient. Even for a $10^4!\times!10^4$ matrix with density $10^{-3}$, it uses up more than 1GB of RAM. My goal is to work with $10^7!\times!10^7$ sparse matrices. Hope someone provides a new answer : ). – Leo Jun 28 '19 at 17:15