Description: Description: Description: K:\accessories\aau_logo.gif

Summer school


Multimedia Information and Signal Processing

Zheng-Hua Tan, Associate Professor, Ph.D.

+45 9940-8686,,

Office: Room A6-319, Niels Jernes Vej 12

Description: Description: Description: K:\accessories\movline.gif

Workload: 5 ECTS

Prerequisites: Basic knowledge in probability theory, linear algebra and computer programming.


Digital processing is playing an increasingly important part in modern multimedia applications with the development of faster processors and high bandwidth networks allowing many new applications appearing. Most multimedia systems require reliable and efficient methods for extracting different model-parameters, for example for compression, for enhancement or for classification.
Understanding the different methods and their limits for such a parameter estimation and classification is therefore crucial both for the design and evaluation of the entire multimedia system.

The purpose of the theme study is to estimate or extract relevant parameters or information of a multimedia signal, which can subsequently be used for automated classification or analysis. Examples of such multimedia signal include biometrics, images and video, audio and speech signals, and examples of the classification or analysis process include identity verification, speech recognition, and music information retrieval.
In the end of the study the students will carry out joint project work with the support of supervisors. The projects involve different methods for feature extraction, classification and analysis of multimedia data.

A prototype of systems such as speaker identification, music classification and visual signature verification will be implemented on PCs or smart phones.


Topics covered include:
    Acquisition and representation of multimedia signals
    Feature extraction from speech, music, images, etc.
    Bayes decision theory: Bayes rule, loss function
    Supervised learning (of classification and regression functions): K-nearest neighbors, decision trees, linear regression, linear discriminant analysis
    Unsupervised learning (for clustering, density estimation and dimensionality reduction): K-means, Gaussian mixture model, principal component analysis
    Model selection: bias and variance, boosting and cross-validation

Extensive course slides will be made available prior to the course. Additional readings:
[1]    F. Camastra and A. Vinciarelli, Machine Learning for Audio, Image and Video Analysis: Theory and Applications. Springer, 2008.
Google Books
[2]    Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, Second Edition. Wiley Interscience, 2001.
[3]    S.V. Vaseghi, Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications.
Wiley, 2007. Google Books


Note: The schedule is indicative and subject to change, and reading is optional. 


Lecture 1-2: Introduction (slides)

Readings: Reference [1] Chapters 1 & 4.

Project presentation (slides).

Lecture 3: Acquisition, representation and feature extraction of multimedia signals (slides)

Readings: Reference [1] Chapters 2 & 3.


Lecture 4: Decision tree and random forest (slides)

Readings: Reference [2] Chapter 8.


Lecture 5: Clustering and Gaussian mixture model (slides)

Readings: Reference [2] Chapter 10.


Lecture 6: Bayesian decision theory (slides)

Readings: Reference [1] Chapter 5 or Reference [2] Chapter 2.


Lecture 7: Parametric and nonparametric methods (slides)

Readings: Reference [2] Chapters 3 and 4.


Lecture 8: Supervised learning  (slides)

Readings: Reference [2] Chapters 5 and 6.


Lecture 9: Unsupervised learning  (slides)

Readings: Reference [2] Chapter 10.


Lecture 10: Model selection and applications  (slides)

Readings: Reference [1] Chapter 7 and Part III.