For simplicity we assume that the room and the source are fixed.
A room's impulse response depends on a LOT of different factors which include the exact position, directivity and orientation of the microphone. So there isn't a single impulse response but there are infinitely many.
At the same time, the room transfer function has a very large number of degrees of freedom. Let's say we have a residential room with a reverb time off 0.3s sampled at 48 kHz. For a decent signal to noise ratio in the measurement (> 40dB) you need an impulse response length of about 10,000 samples which corresponds to a frequency resolution of about 5 Hz.
A typical room transfer function consists of 1000s of narrow peaks and notches that are quite pronounced but whose exact location and gain depends A LOT on the exact position and directional characteristics of the microphone.
Even for microphone locations that are fairly close together, the individual transfer functions don't correlate well at all. The fine structure of impulse response and transfer function looks completely different if you move the mic by a few centimeters.
Hence inverting the transfer function is only useful if you ONLY want to use the signal recorded with the measurement microphone itself. At any other location this will just make it a lot worse. Simple example: if your measurement has a 15 dB deep notch at 2045 Hz inverting the transfer function would require you to add 15 dB of gain. However at a location 10 cm away you may already have a 10 dB peak at 2045Hz so adding the 15 dB of gain would result in excess gain of 25 dB, which indeed sounds terrible.
If "listening" implies "using your own ears in the room", than inverting a single microphone transfer function doesn't work. You have two ears, their directivity is quite a bit different than that of the measurement microphone AND you can't get both ears at the same location.
That's why most room EQ process(like REW) will
- Focus on information and a level of detail that is perceptually relevant but typically a lot less than the exact fine structure of impulse response and transfer function.
- Try to compensate for features that can be assumed to be reasonably consistent across the target area in the room.