I am not aware of any method that would let one make good use of a black box or API computing $f(a,e,n)=a^e\bmod n$ for $n$ of up to $2112$ bits, to efficiently compute $f(a,e,n)=a^e\bmod n$ with $n$ above that bound (like $4096$ bits), unless that bigger $n$ has known factorization into terms of at most $2112$ bits (in which case the usual CRT technique applies and significantly helps).
That issue is encountered when one wants to compute the RSA public key function for 4096-bit key on top of software (or API to hardware) limited to $2048$-bits-and-then some.
Especially if $e$ is small (like $65537$, $17$, $3$, or $2$), it is sometime possible to do a fast-enough software-only implementation in assembly language (which typically beats C by a decimal order of magnitude, and interpreted bytecode much more so). And for the purpose of signature verification, this is unquestionably safe.
But even if $e$ is small, if the context is a JavaCard Smart Card without any way to evade the JavaCard Virtual Machine, I'm afraid there is no practical solution, unless execution time is not an issue.
0x010001) in an unsigned representation, but it is normally padded to 24 bits to fit into an N number of bytes. – Maarten Bodewes Oct 02 '13 at 15:23