Fenton, Steven Michael (2017) Audio Dynamics - Towards a Perceptual Model of Punch. Doctoral thesis, University of Huddersfield.

This thesis discusses research conducted towards the development of an objective model that predicts punch in musical signals. Punch is a term often used by engineers and producers when describing a particular perceptual sensation found in produced music. Music is often characterised by listeners as being punchier yet the term is subjective, in terms of its meaning and the subsequent auditory effect on the listener. An objective model of punch would therefore prove useful for both music classification purposes and as a possible further metric that could be employed in music production and mastering metering tools.

The literature reviewed within this body of work encompasses both subjective and objective
audio evaluation methods in addition to low-level signal extraction and measurement techniques. The review concludes that whilst there has been a great deal of work in the area of semantic description and audio quality measurement, low-level analysis with respect to the perception of punch remains largely unexplored.

The project was completed in a number of phases each designed to investigate the perceptual
effects resulting from manipulation of test stimuli. The rationale behind this testing was to establish the key low-level descriptors relating to the punch attribute with the aim of producing a final objective and perceptually based model. The listening tests in each phase were conducted according to the ITU-R BS 1534-1 recommendation.

In producing an objective model for the prediction of punch, listener perception to the attribute shows a strong correlation to the signal onset times, octave frequency band, signal duration and dynamic range. The punch measure obtained using the model is named PM95, where 95 indicates the upper percentile used in the measurement.

Secondary measures were also obtained as a result of the iterative approach adopted. These are Inter-Band-Ratio (IBR), Transient to Steady-state Ratio (TSR) and Transient to Steady-state Ratio+Residual (TSR+R). These measures are useful in quantifying overall audio quality with respect to its dynamic range across frequency bands in addition to being a more reliable metric for defining the overall compression being applied to a piece of music. In addition, the latter two measures proposed may be useful in highlighting perceptual masking artefacts.

The completed perceptual punch model was validated using the scores obtained from a large scale and independently conducted forced pairwise comparison test using expert listeners and a wide range of musical stimuli. From the results obtained, the PM95 measure showed a ‘very strong’ positive correlation with listener punch perception. Both r and rho coefficients (0.849 and 0.833) being significant at the 0.01 level (2-tailed). The PM95M measure, which is the PM95 measure divided by the mean value of punch frames also correlated very well with the perceptual punch scale having both r and rho coefficients (0.707 and -0.750) being significant at the 0.05 level (2-tailed).

A real-time implementation of the punch model (and other measures proposed in this thesis)
could be utilised as extensions to the metrics currently being used in Music Information

FINAL THESIS - FENTON.pdf - Accepted Version
Restricted to Registered users only

Download (10MB)
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email