Syllabus Exercises 


Machine Learning
(Ph.D. Course), Spring 2011 ZhengHua Tan, Associate
Professor, Ph.D. +45 99408686, zt@es.aau.dk, http://kom.aau.dk/~zt Office:
Room A6319, Niels Jernes Vej 12, Aalborg University, Denmark


Machine learning is concerned with
the development of computer programs that allow computer (or machine) to
learn from examples or experiences. Machine learning is of interdisciplinary
nature, with roots in computer science, statistics and pattern recognition.
In the past decade, this field has witnessed rapid theoretical advances and
growing realworld applications. Successful applications include machine
perception (speech recognition, computer vision), control (robotics), data
mining, web search and text classification, timeseries prediction, system modelling, bioinformatics, data compression, and many
more. This course will give a
comprehensive introduction to machine learning both by presenting
technologies proven valuable and by addressing specific problems such as
pattern recognition and data mining. This course covers both theory and
practices for machine learning, but with an emphasis on the practical side
namely how to effectively apply machine learning to a variety of problems.
Topics will include · Supervised
learning (of classification and regression functions) Knearest neighbors, decision trees, naïve Bayes, support vector
machines, logistic regression, evolutionary algorithms, Bayesian Networks,
hidden Markov model, neural networks, boosting · Unsupervised
learning and clustering Kmeans, hierarchical clustering (agglomerative and divisive),
principal component analysis, independent component analysis, Expectation
Maximization algorithm · Reinforcement
learning Prerequisites: Time: May
23, 24, 26, 31, June 7, 9, 2011 (each day 09:1516:15) Place: Niels Jernes Vej 14, Room 4111 (the video conference auditorium),
9220 Aalborg. After 14:00 Room 4111 and Room 3119 will be available for exercise. Information: This course consists
of fiveday lectures. Students are highly encouraged to do mini projects,
either presented in the course or related to their own PhD research, and hand
in short reports by the end of February. Coffee and bread will be served at
10:00 in the morning; coffee and cake at 14:00 in the afternoon. Note: The schedule is
indicative and subject to change, and reading is optional. DAY 1 Lecture 1: Introduction (slides) Readings: Chapters 1 and 2 of Alpaydin’s book; or Chapter 1 of Bishop's book.
Lecture 3: Parametric methods (ML, MAP &
Bayesian learning) (slides) Exercises for DAY1: download dataset1_noisy and Netlab toolbox, and do Exercise1. DAY 2 Lecture 4: Dimensional reduction (slides) Lecture 5: Clustering (slides) Lecture 6: Nonparametric methods (Parzen windows and KNN) (slides) Exercises for DAY2: download full dataset,
which is a Matlab format of THE MNIST DATABASE of handwritten digits by Yann LeCun, and Corinna Cortes, and
do Exercise2: (1) from the 10class database, choose three classes (5,
6 and 8) and then reduce dimension to 2; (2) perform 3class classification
based on the generated 2dimensional data. You may want to use eigdec.m and pca.m in Netlab toolbox and the LDA
code. DAY 3 Lecture 7: Linear discrimination (slides) Lecture 8: Support vector machines (slides) Exercises for DAY3: perform classification for the
entire dataset based on the algorithms introduced (using LDA for
dimensionality reduction). As an option, you can perform the
classification by using LIBSVM  A Library for Support Vector
Machines. DAY 4 Lecture 9: Multilayer perceptrons
and evolutionary computation (slides) Lecture 10: Time series models (slides) Exercises for DAY4: develop an MLP for the MNIST
database by using the dimensionreduced data from your work on DAY 2 and DAY
3. You can download the LDA projected data here.
Further, you can use 10, 20 and 30dimensional data generated by PCA and compare
their performance (at the same time, try various MLP architectures).
Functions for MLP in the NETLAB toolbox include mlp.m,
mlptrain.m and mlpfwd.m. DAY 5 Lecture 11: Graphical
models (Introduction,
main
slides, main slides 
commented) Readings: Chapter 8 of Bishop’s
book. Exercises for DAY5: choose your own images and apply
Markov random field for denoising by using Matlab code. Optionally, you can play with Bayesian Net Toolbox. DAY 6 Lecture 12: Algorithmindependent machine
learning (slides) Lecture 13: Reinforcement learning (slides) Wrapup. Exercises for DAY6: implement AdaBoost
for the MNIST database or improve the system that you have developed by
choosing algorithms you like. A tutorial on AdaBoost
is available here.
Introduction to Machine Learning, Ethem Alpaydin, The MIT Press, USA, October 2004, Pattern Classification, Second Edition, Richard O. Duda, Peter E. Hart, David G. Stork, Wiley Interscience, USA, 2001. Instructor:
