When you are choosing to link data between objects you are telling Blender that these objects share the same data. It's not just that when you edit one object Blender will edit the other object to match, it's that they both reference the same data.
To explain this further, imagine 3 objects, each with their own object data:

You can think of this as 6 units that Blender has to save: 3 object data blocks and 3 object blocks.
Now, if were were to link the data of these three objects it would look like this:

Now there are only 4 units Blender has to save: 1 object data block and 3 object blocks. Obviously, only having to save 4 instead of 6 blocks of data will result in a smaller file.
The more objects that share each other's data, the more memory is saved.
When objects that share data are joined, you can imagine Blender first separating all the data so each object has its own object data (like image 1) and then joining the objects together. Yes, that would result in only 2 data blocks (1 object and 1 object data), but the object data would now contain 3 times the number of vertices (from merging the 3 separate object data blocks), resulting in the file size going back up.
To summarise, if objects are using linked data, the data is shared so the file will be smaller, but when objects are joined, any memory sharing is lost as all vertices (and any other data) are joined into one object data block.
There shouldn't be any trouble with using linked data in another scene, the only issue that can occur is forgetting the object is linked, editing the object and then realising all the other objects have now been edited, which might have been unintended.