SIPCom8-4
Speech Processing
The course is given by Per Rubak (PR), Mads Græsbøll Christensen (MGC), Søren Holdt Jensen (SHJ), Chunjian Li (CL), and Zheng-Hua Tan (ZT). The course coordinator is Zheng-Hua Tan.
Literature:
Deller, Hansen, Proakis, Discrete-Time Processing of Speech Signals, 2nd. edition, Wiley-IEEE Press, 1999.
---------------------------------------------
Lecture 1:
Time and place:
Tuesday, February 7th 2006
Lecture, 12:30 to 14:00 in A4-106
Problem solving, 14:14 to 16:15
Topics: Fundamentals of speech science, model of speech production
Lecturer: ZT
Preparation:
Literature: Deller, Hansen, Proakis, “Discrete-Time Processing of Speech Signals”, chapter 2 and 3, (pp. 81-85, 99-146, 151-201)
Exercise:
Speech files associated with the textbook and Speech files for this exercise. The tool - Speech Filing System.
---------------------------------------------
Lecture 2:
Time and place:
Tuesday, February 14th 2006
Lecture, 12:30 to 14:00 in A4-106
Problem solving, 14:14 to 16:15
Topic:
Speech perceptionLecturer: PR
Slides
Exercise:
Solution
---------------------------------------------
Lecture 3:
Time and place:
Tuesday, February 21st 2006
Lecture, 12:30 to 14:00 in A4-106
Problem solving, 14:14 to 16:15
Topics:
Fundamentals of linear prediction-based speech coding.Lecturer:
MGC
Literature:
W. B. Kleijn and K. K. Paliwal, Eds., "Speech
Coding
and Synthesis", 1995, Chapter 1 (pp. 3-47) and 3 (pp. 79-119).
Excercises:
Click here.
---------------------------------------------
Lecture 4:
Time and place:
Thursday, February 28th 2006
Lecture, 12:30 to 14:00 in A4-106
Problem solving, 14:14 to 16:15
Topics: Parametric speech and audio coding,
relation to vector quantization, efficient implementation of perceptual
distortion measures. Lecturer:
Literature:
M. G. Christensen and S. H. Jensen, "On perceptual
distortion minimization and nonlinear least-squares frequency
estimation", IEEE Trans. on Audio, Speech and Language Processing,
Volume 14, Issue 1, Jan. 2006, pp. 99-109 (pdf).
Excercises:
Click here.
Lecture 5:
Time and place:
Tuesday, March 7th 2005
Lecture, 12:30 to 14:00 in A4-106
Problem solving, 14:14 to 16:15
Lecturer:
MGCLiterature:
Excercises:
Click here.
---------------------------------------------
Lecture 6:
Time and place:
Tuesday, March 21st 2006
Lecture, 12:30 to 14:00 in NJ14 3-119
Problem solving, 14:14 to 16:15
Topics:
Speech enhancement, Single-Channel Spectral Subtraction based methodsLecturer: SHJ / CL
Preparation:
Slides for MM6
Literature:
Deller, Hansen, Proakis, “Discrete-Time Processing of Speech Signals”, chapter 8
Supplementary Literature:
D. Ealey, H. Kelleher, and D. Pearce, "Harmonic Tunneling: tracking non-stationary noise during speech", Eurospeech 2001
R. Martin, "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", Trans. Speech and Audio Processing, vol. 9, no. 5, July 2001, pp. 504-514
Exercise:
---------------------------------------------
Lecture 7:
Time and place:
Tuesday, March 28th 2006
Lecture, 12:30 to 14:00 in NJ14 3-119
Problem solving, 14:14 to 16:15
Topic:
Speech enhancement, Multi-Channel methodsLecturer: SHJ / CL
Preparation:
Literature:
Deller, Hansen, Proakis, “Discrete-Time Processing of Speech Signals”, chapter 8
Exercise:
---------------------------------------------
Lecture 8:
Time and place:
Tuesday, April 4th 2006
Lecture, 12:30 to 14:00 in A4-108 (Note the room!)
Problem solving, 14:14 to 16:15
Topic:
Speech synthesis & Speech Recognition (DTW)Lecturer: ZT
Preparation:
Literature:
D. O'Shaughnessy, "Interacting with Computers by Voice: Automatic Speech Recognition and Synthesis", Proceedings of the IEEE, 91 (9), Sept. 2003. (PDF. Password for opening the PDF file is the name of your specialisation excluding semester number, all in lowercase)
Deller, Hansen, Proakis, “Discrete-Time Processing of Speech Signals”, chapter 11.
Exercise:
Exercise 2: Use Matlab to implement Dynamic Time Warping to compare speech signals
---------------------------------------------
Lecture 9:
Time and place:
Tuesday, April 11th 2006
Lecture, 12:30 to 14:00 in NJ14 3-119 (Note the room!)
Problem solving, 14:14 to 16:15
Topic:
Speech recognition, HMMLecturer: ZT
Preparation:
Literature:
Deller, Hansen, Proakis, “Discrete-Time Processing of Speech Signals”, chapter 12. Alternative: Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition", Proceedings of the IEEE, 77 (2), 1989, pp. 257 - 286. (PDF. Password for opening the PDF file is the name of your specialisation excluding semester number, all in lowercase)
Steve Young, et al., "The HTK Book" (PDF)
Exercise:
Exercise 1: Hidden Markov model and Viterbi decoding.
---------------------------------------------
Lecture 10:
Time and place:
Tuesday, April 18th 2006
Lecture, 12:30 - 14.00 in A4-106 (Note the room!)
Problem solving, 14.14 to 16:15
Topic:
Large Vocabulary Continuous Speech Recognition (LVCSR) and MoreLecturer: ZT
Preparation:
Literature:
Steve Young, "A review of large-vocabulary continuous-speech", IEEE Signal Processing Magazine, Sep 1996 (PDF. Password for opening the PDF file is the name of your specialisation excluding semester number, all in lowercase). Alternatively, Huang, Acero and Hon, Spoken Language Processing, Chapter 9.
Exercise:
Given at the lecture.