Workload: 5 ECTS
Prerequisites: Basic knowledge in probability theory, linear
algebra and computer programming.
Digital processing is playing an increasingly important part in modern multimedia
applications with the development of faster processors and high bandwidth
networks allowing many new applications appearing. Most multimedia systems
require reliable and efficient methods for extracting different
model-parameters, for example for compression, for enhancement or for
Understanding the different methods and their limits for such a parameter
estimation and classification is therefore crucial both for the design and
evaluation of the entire multimedia system.
The purpose of the theme study is to estimate or extract relevant parameters
or information of a multimedia signal, which can subsequently be used for
automated classification or analysis. Examples of such multimedia signal
include biometrics, images and video, audio and speech signals, and examples
of the classification or analysis process include identity verification,
speech recognition, and music information retrieval.
In the end of the study the students will carry out joint project work with
the support of supervisors. The projects involve different methods for
feature extraction, classification and analysis of multimedia data.
A prototype of systems such as speaker identification, music classification
and visual signature verification will be implemented on PCs or smart phones.
Topics covered include:
· Acquisition and representation of multimedia signals
· Feature extraction from speech, music, images, etc.
· Bayes decision theory: Bayes rule, loss function
· Supervised learning (of classification and regression
functions): K-nearest neighbors, decision trees, linear regression, linear
· Unsupervised learning (for clustering, density estimation
and dimensionality reduction): K-means, Gaussian mixture model, principal
· Model selection: bias and variance, boosting and
Extensive course slides will be made available prior to the course.
 F. Camastra and A. Vinciarelli, Machine Learning for Audio, Image and Video
Analysis: Theory and Applications. Springer, 2008. Google Books
 Richard O. Duda, Peter E.
Hart, David G. Stork, Pattern Classification, Second Edition. Wiley Interscience, 2001.
 S.V. Vaseghi, Multimedia
Signal Processing: Theory and Applications in Speech, Music and
Communications. Wiley, 2007. Google
Note: The schedule is
indicative and subject to change, and reading is optional.
Lecture 1-2: Introduction (slides)
Readings: Reference  Chapters 1
Project presentation (slides).
Lecture 3: Acquisition, representation and feature extraction
of multimedia signals (slides)
Readings: Reference  Chapters 2
Lecture 4: Decision tree and random forest (slides)
Readings: Reference  Chapter 8.
Lecture 5: Clustering and Gaussian mixture model (slides)
Readings: Reference  Chapter
Lecture 6: Bayesian decision theory (slides)
Readings: Reference  Chapter 5
or Reference  Chapter 2.
Lecture 7: Parametric and nonparametric
Readings: Reference  Chapters 3
Lecture 8: Supervised learning (slides)
Readings: Reference  Chapters 5
Lecture 9: Unsupervised learning (slides)
Readings: Reference  Chapter
Lecture 10: Model selection and applications
Readings: Reference  Chapter 7
and Part III.