


Machine Learning
(Ph.D. Course) ZhengHua Tan, Associate
Professor, Ph.D. +45 99408686, zt@es.aau.dk, http://kom.aau.dk/~zt Office:
Room B4202, Fredrik Bajers Vej 7, Aalborg University, Denmark
The course is given every second year since 2007. Please refer to my TEACHING webpage for detailed information. 

Machine learning is concerned with
the development of computer programs that allow computer (or machine) to
learn from examples or experiences. Machine learning is of interdisciplinary
nature, with roots in computer science, statistics and pattern recognition.
In the past decade, this field has witnessed rapid theoretical advances and
growing realworld applications. Successful applications include machine
perception (speech recognition, computer vision), control (robotics), data
mining, web search and text classification, timeseries prediction, system modelling, bioinformatics, data compression, and many
more. This course will give a
comprehensive introduction to machine learning both by presenting
technologies proven valuable and by addressing specific problems such as
pattern recognition and data mining. This course covers both theory and
practices for machine learning, but with an emphasis on the practical side
namely how to effectively apply machine learning to a variety of problems.
Topics will include · Supervised
learning (of classification and regression functions) Knearest neighbors, decision trees, naïve Bayes, support vector
machines, logistic regression, evolutionary algorithms, Bayesian Networks,
hidden Markov model, neural networks, boosting · Unsupervised
learning and clustering Kmeans, hierarchical clustering (agglomerative and divisive),
principal component analysis, independent component analysis, Expectation
Maximization algorithm · Reinforcement
learning Prerequisites: Time: Every second year ( Spring 2015, Spring 2013, Spring 2011, Fall 2009, Fall 2007). Place: Aalborg University.
Literature: Machine Learning – A Probabilistic Perspective, Kevin P. Murphy, The MIT Press, 2012. Introduction to Machine Learning – second edition, Ethem Alpaydin, The MIT Press, USA, October 2009. Pattern Classification, Second Edition, Richard O. Duda, Peter E. Hart, David G. Stork, Wiley Interscience, USA, 2001. Note: The schedule is
indicative and subject to change, and reading is optional. DAY 1 Lecture 1: Introduction (slides) Readings: Chapters 1 and 2 of Alpaydin’s book; or Chapter 1 of Bishop's book.
Lecture 3: Parametric methods (ML, MAP &
Bayesian learning) (slides) Exercises for DAY1: download dataset1_noisy and Netlab toolbox, and do Exercise1. DAY 2 Lecture 4: Dimensional reduction (slides) Lecture 5: Clustering (slides) Lecture 6: Nonparametric methods (Parzen windows and KNN) (slides) Exercises for DAY2: download full dataset,
which is a Matlab format of THE MNIST DATABASE of handwritten digits by Yann LeCun, and Corinna Cortes, and
do Exercise2: (1) from the 10class database, choose three classes (5,
6 and 8) and then reduce dimension to 2; (2) perform 3class classification
based on the generated 2dimensional data. You may want to use eigdec.m and pca.m in Netlab toolbox and the LDA
code. DAY 3 Lecture 7: Linear discrimination (slides) Lecture 8: Support vector machines (slides) Exercises for DAY3: perform classification for the
entire dataset based on the algorithms introduced (using LDA for
dimensionality reduction). As an option, you can perform the
classification by using LIBSVM  A Library for Support Vector
Machines. DAY 4 Lecture 9: Multilayer perceptrons
and evolutionary computation (slides) Lecture 10: Time series models (slides) Exercises for DAY4: develop an MLP for the MNIST
database by using the dimensionreduced data from your work on DAY 2 and DAY
3. You can download the LDA projected data here.
Further, you can use 10, 20 and 30dimensional data generated by PCA and compare
their performance (at the same time, try various MLP architectures).
Functions for MLP in the NETLAB toolbox include mlp.m,
mlptrain.m and mlpfwd.m. DAY 5 Lecture 11: Graphical
models (Introduction,
main
slides, main slides 
commented) Readings: Chapter 8 of Bishop’s
book. Exercises for DAY5: choose your own images and apply
Markov random field for denoising by using Matlab code. Optionally, you can play with Bayesian Net Toolbox. DAY 6 Lecture 12: Algorithmindependent machine
learning (slides) Lecture 13: Reinforcement learning (slides) Wrapup. Exercises for DAY6: implement AdaBoost
for the MNIST database or improve the system that you have developed by
choosing algorithms you like. A tutorial on AdaBoost
is available here.
Instructor:
