A: It's so slow because scene update synchronizes to the refresh rate of your monitor. You need to do 1000 scene updates. On a 60hz monitor that would take 1000 / 60 or a bit less than 17 seconds minimum. The additional 3 seconds is probably spent on creating the primitives and updating the relevant data structures plus a bit of bad luck when your operation takes slightly too long and you miss a refresh synchronization.
B: Find a way that uses fewer calls to bpy.ops but rather uses calls to functions that don't require screen updates. One possibility is to use the bmesh module to create the cubes and then only do one screen update when you copy the bemsh to an object's mesh data. As mentioned in the comments on your question, this answer shows an example of using bmesh.
Since you want to create 1000 identical objects but at different locations, I can't provide you with exact code because you need to see which of several approaches is fastest. You can, for example, only create one cube, but duplicate and move it, or you can use numpy to create the data structure and then convert it to a mesh.
One question you've left open is whether or not you want all 1000 cubes to each be separate objects, or you could work with all of the cubes in one object. The later approach is much faster, if it allows you to do what you want with the resulting mesh.