Information-Theoretic Listening Paris Smaragdis Machine Listening Group MIT Media Lab 02/24/20 Outline Defining a global goal for computational audition Example 1: Developing a representation Example 2: Developing grouping functions
Conclusions 2 Auditory Goals Goals of computational audition are all over the place, should they? Lack of formal rigor in most theories Computational listening is fitting psychoacoustic experiment data
3 Auditory Development What really made audition? How did our hearing evolve? How did our environment shape our hearing? Can we evolve, rather than
instruct, a machine to listen? 4 Goals of our Sensory System Distinguish independent events Object formation Gestalt grouping Minimize thinking and effort Perceive as few objects as possible Think as little as possible 5 Entropy Minimization as a
Sensory Goal Long history between entropy and perception Barlow, Attneave, Attick, Redlich, etc ... Entropy can measure statistical dependencies Entropy can measure economy in both thought (algorithmic entropy) and information (Shannon entropy) 6 What is Entropy?
Shannon Entropy: H(x) Px (x)log Px (x)dx A measure of: Order Predictability Information Correlations
Simplicity Stability Redundancy ... High entropy = Little order Low entropy = Lots of order 7 Representation in Audition Frequency decompositions Cochlear hint Easier to look at data!
Sinusoidal bases Signal processing framework 8 Evolving a Representation Develop a basis decomposition Bases should be statistically independent Satisfaction of minimal entropy idea
Decomposition should be data driven Account for different domains 9 Method Use bits of natural sounds to derive bases s k 1 reshape
S n m Analyze these bits with ICA S W X W(i ) indep of W( j )i, j 1 Results We obtain sinusoidal bases! Transform is driven by the
environment Uniform procedure for different domains 1 Auditory Grouping Heuristics Hard to implement on
computers Require even more heuristics to resolve ambiguity Weak definitions Bootstrapped to individual domains Good Continuation Common AM Common FM Vision Gestalt Auditory Gestalt 1
Method Goal: Find grouping that minimizes scene entropy Parameterized Auditory Scene s(t,n) Density Estimation Ps(i) Shannon Entropy Calculation H(s) Ps (i,..., j)ln Ps (i,..., j) i,...,j 1
Common Modulation - Frequency Scene Description: Entropy Measurement: s(t,n) {cos(r f1 (t) t),cos( f2 (t)t)} f1 f2 if n 0.5 Frequency n = 0.5 Time 1
Common Modulation - Amplitude Scene Description: Entropy Measurement: s(t,n) {a1 (t)cos(r f0 t), a2 (t)cosf 0 t } Sine 1 Amplitude Sine 2 Amplitude a1 a2 if n 0.5 n = 0.5
Time 1 Common Modulation - Onset/Offset Sine 2 Amplitude Scene Description: Sine 1 Amplitude Entropy Measurement: n = 0.5
Time 1 Similarity/Proximity - Harmonicity I Scene Description: Entropy Measurement: s(t,n) {cos( f0 t),cos(n f0 t)} Frequency Time 1
Similarity/Proximity - Harmonicity II Scene Description: Entropy Measurement: Frequency s(t,n) {cos( f0 t),cos(2 f0 t),cos(nf0 t )} Time 1 Simple Scene Analysis Example
Simple scene: 5 Sinusoids 2 Groups Simulated Annealing Algorithm Input: Raw sinusoids Goal: Entropy minimization Output: Expected grouping 1 Important Notes No definition of time Developed a concept of frequency
No parameter estimation requirement Operations on data not parameters No parameter setting! 2 Conclusions Elegant and consistent formulation No constraint over data representation
Uniform over different domains (Cross-modal!) No parameter estimation No parameter tuning! Biological plausibility Barlow et al ... Insight to perception development 2 Future Work Good Cost Function? Incorporate time
Joint entropy vs entropy of sums Shannon entropy vs Kolmogorov complexity Joint-statistics (cumulants, moments) Sounds have time dependencies Im ignoring Generalize to include perceptual functions 2 2 arg min (H(s(t),template(t))) Dissonance and Entropy
Pitch Detection Instrument Recognition template arg min (H(s(t),cos( f (t) t))) f H(5th | pythagorean) H(5th | equal temperament) H(Maj chord ) H(Min chord ) H (Dim chord ) Teasers