Factor Analyzed Subspace Modeling and Selection

Jen-Tzung Chien and Chuan-Wei Ting


      We present a novel subspace modeling and selection approach for noisy speech recognition. In subspace modeling, we develop factor analysis (FA) representation of noisy speech, which is a generalization of signal subspace (SS) representation. Using FA, noisy speech is represented through the extracted common factors, factor loading matrix and specific factors. The observation space of noisy speech is accordingly partitioned into a principal subspace containing speech and noise and a minor subspace containing residual speech and residual noise. We minimize the energies of speech distortion in principal subspace as well as minor subspace so as to estimate clean speech with residual information. More attractively, we explore optimal subspace selection via solving hypothesis test problems. We test the equivalence of eigenvalues in minor subspace to select subspace dimension. To fulfill FA spirit, we also examine the hypothesis of uncorrelated specific factors/residual speech. Subspace can be partitioned according to a consistent confidence towards rejecting null hypothesis. Optimal solutions are realized by likelihood ratio tests, which come up with the approximated chi-square distributions as test statistics. In the following table, we show some Aurora2 samples of noisy speech and enhanced speech using SS and FA subspace models under different SNR conditions.


@ -5 dB 0 dB 5 dB 10 dB 15 dB 20 dB
Noisy Speech [1][2] [1][2] [1][2] [1][2] [1][2] [1][2]
SS Enhanced Speech [1][2] [1][2] [1][2] [1][2] [1][2] [1][2]
FA Enhanced Speech [1][2] [1][2] [1][2] [1][2] [1][2] [1][2]

[1]: Aurora2 utterance  MKA_7ZZZ9Z6A "seven zero zero zero nine zero six" under Subway environment

[2]: Aurora2 utterance  MFG_Z558Z28A "zero five five eight zero two eight" under Station environment