Unfortunately, no, the code probably can't be improved much. Mathematica can be fast, but only when most of the work is spent inside its large, perfectly optimized built-in functions (such as Tally and Permutations in your case). Generally, the smaller the building blocks you use, the slower the program. Even Sort[list, Greater] is slower than plain Sort[list], because its fast internal comparisons are replaced with slow external evaluations of a user function.
That said, it's actually possible to make the code several times faster at the cost of even more memory consumption:
{Reverse[DeleteCases[#[[1]], 0]], #[[2]]} & /@
Tally@Table[Sort[lam + muPerm], {muPerm, Permutations@mu}]
I also tried to optimize for memory with my own compiled permutation generator and Association instead of Tally to avoid storing everything at once. I failed miserably. While using only as much memory as needed, this version turned out to be ten times slower.
My advice to you is to code it in C or in Julia, which is as fast as C, as easy to use as Python, and has a Mathematica interface.
EDIT: no upvotes? OK, posting my slow yet memory effective code, based on Combinatorica's NextPermutation function, modified to support repeated entries and stop after the last permutation. Still about 20 times faster than the upvoted solution.
nextPermutation = Compile[{{l, _Integer, 1}},
Module[{nl = l, n = Length[l], i, j},
i = n - 1;
While[i > 0 && nl[[i]] >= nl[[i + 1]], i--];
If[i == 0, Return[{-1}]];
j = n;
While[nl[[j]] <= nl[[i]], j--];
{nl[[i]], nl[[j]]} = {nl[[j]], nl[[i]]};
Join[Take[nl, i], Reverse[Drop[nl, i]]]
]
];
perm = Sort[mu];
tally = <||>;
While[Min[perm] >= 0,
item = Reverse@DeleteCases[Sort[lam + perm], 0];
tally[item] = Lookup[tally, Key[item], 0] + 1;
perm = nextPermutation[perm]
];