by Peter W. Mon Jun 26, 2017 7:57 am
I kinda-sorta think that this is an interesting discussion. But we need to be very careful in separating the actual physics from speculation. Accordingly, I cobbled a bit of a reality test:
One is in a live venue, and the proverbial cymbal crashes and very nearly (key word here when one is dealing in many hz) simultaneously, the drumstick strikes the snare drum.
When the sound reaches you, who is to suggest that the wave-front reaches you ^ this way or / that way? Consider how far (vertically and horizontally, and offset) the snare drum is from the cymbal, how far either is from you, and whether the strikes on each was truly simultaneous or a few milliseconds apart.
What strikes your ear is an amalgamation (also a key word, as compared to a mixture, meld, combination or alloy) of many sounds from many sources. There will also be, necessarily, interference waves, reinforcement waves and many other artifacts that may easily be described, but are tough to reproduce with any degree of accuracy. We strive to do so with a combination of machines none of which is perfect and many of which are seriously flawed. Such as requiring 226 square inches of rubber, paper and glue to reproduce the sounds of an organ Bombard pipe (area of two 12" conventional bass drivers).
So: If the basic position taken is that a wave-front on the "push" is more 'natural' than one on the "pull", who is to suggest that each transient as-recorded (or as it hits the microphone) is vibrating the diaphragm one way or the other at that moment? Now, move on to the tympanum. Who is to suggest that what is going on in *your* ear is on the 'innie' or the 'outie'? Presumably, what is happening in your ear is a close analog of what is going on in the speaker... Or not? Now, consider that we have two (2) ears. They are about 5" - 6" apart, or so. Or, about half-a-wavelength at C5.
Leading to microphone placement - surely they are some multiple/fraction of various waves apart from each other, and so will be recording variances by their nature. We think of sound as 'square waves' while (mostly) forgetting that square waves are composed of many sine waves each at its own frequency, each adding a subtle or not so subtle colorant to the tone
Point being what hits the ear is a vast amalgam of 'ins' and 'outs' all at once. Even in a live venue. A recording/transcript (microphone-->electronics-->speaker will mix up that amalgam to some degree. We respond by using a traditional configuration multiple speakers to give us the best chance of capturing and repeating the original noise. There is nothing magical about it, nor is attempting to reason from the singular to the multiple useful as the target is constantly moving - and in no particular pattern. The best we can do is minimize any additional artifacts added (or actual artifacts deleted) by our noise reproducer system.