pruefungen:hauptstudium:ls5:pr-2016-03-01

Examiner: Prof. Nöth

Grade: 1.0

Atmosphere: He directly began with the first question. He was kind and detail-oriented, but his questions were also imprecise. Generally, he stayed at a question until I mentioned the keyword he wanted to hear. If you don't seem to find it, try to write down and explain the formulas you know to a problem. This way he sees that you understood the subject. I was a bit nervous and tired, so I did some minor accidental mistakes (like saying, the SVM classes are 0 and 1). He noticed them and asked again, so I could correct them.

Questions:

Nearest Neighbor (from exercises)

Q: He drew a coordinate system with blood pressure on the one and body temperature on the other axis. He inserted some data points and assigned class labels (sick / not sick). How would Nearest Neighbor work?
Q: How can you calculate the distance to the nearest neighbor?
A: „Depends on the norm.“ „Yes and also?“ „The data have to be normalized first“
Q: What does the k denote?

Bayes

Q: What is the optimal classifier?
A: Had no idea what he meant with optimal. I referred to the Bayesian classifier with respect to the 0-1-loss function. He was still waiting for the optimal classifier, so I explained the formulas of the Bayesian and transformed the equation to argmax {log p(y) + log p(x|y)}. That seemed to be what he wanted to hear.
Q: How can you calculate the priors?
A: Quite simply based on the relation of the classes in the training data
Q: What property must hold so that you can do this?
A: Training data have to be representative
Q: Regarding the blood-pressure / body temperature problem, is the 0-1-Loss function the right one?
A: No, because predicting healthy for an actually sick person would be fatal, so we need an asymmetric loss function.
Q: How can you apply Bayes to the problem?
A: We can assume that the features are normally distributed → apply Gaussian classifier.
Q: How can you reduce the parameters of the Gaussian Classifier?
A: Assume independency and use Naive Bayes. (not what he wanted to hear)
A: Use Naive Bayes with first order dependency. (not what he wanted to hear)
A: Use LDA to get the covariance matrix to be the identity. (not what he wanted to hear)
A: Apply PCA. (☺☺☺)
Q: Explain PCA:
A: Projecting features into a lower-dimensional subsp…
Q: No, not necessarily
A: Subspace can also have the same dimension. But with PCA, the features are normally distributed and independent.
Q: What property has to hold so you can apply Naive Bayes with First Order Dependencies?
A: Features have to be only related to the feature before them.
Q: For example?
A: Time series (temperature of a day).

Decision Boundaries

Q: He drew two ellipses denoting two covariance matrices. The ellipses were orthogonal to each other. How would the boundary look like for a Gaussian Classifier?
A: I drew a parabola.
Q: How would the Covariance matrices look like for Naive Bayes?
A: Depends on the coordinate system. I drew in a system, so that one ellipse was alligned with one axis and the other stood in a 45° angle towards it. The first one would not change, but the second one would be reduced to a circle.
Q: What does the decision boundary look like now?
A: Still a parabola, but not as steep as the other.

GMM

Q: What do you do, when you cannot assume a Gaussian distribution?
A: Gaussian Mixture Model with EM Algorithm
Q: Can you compute / wisely choose k in advance?
A: No.
Q: Explain the EM-Algorithm
A: Explained by drawing a mixture of two Gaussians.
Q: What is p_k and p_ik?
A: Explained both
Q: How can you call p_k?
A: The contribution of the Gaussian to the GMM. (not what he wanted to hear)
Q: How else?
A: The weight of the Gaussian for each point. (not what he wanted to hear)
Q: How else?
A: No idea. → He told me it is also called the a-priori probability.
Q: How do you update the parameters?
A: Explained at the function i drew, which lead to misunderstandings. Then I just wrote down the formulas for the mean and the p_k and he was happy.

SVM

Basic SVM questions, which you can also find in other exam protocols: Explain the SVM, hard margin case, objective function, soft margin case, slack variables, how exactly do you optimize the objective function (keyword: Lagrangian), what conditions have to hold (KKT) and what are support vectors?

SVR (from exercises)

Q: What is the difference between regression and classification?
Q: What is the difference between the SVM classification and the SVR?
A: SVM → x should be outside of the margin, SVR → x should be inside the margin
Q: Draw an example for the SVR
Q: What happens to points outside of the margin?
A: Slack variables
Q: How are which points penalized?
A: Inside the margin no penalty, outside of the margin penalty via slack variables
Q: What is the resulting loss function?
A: Dead Zone (depends on norm)

Preparation for the exam

Went through the slides with recordings from Hornegger 2012, wrote summary
Went through slides again without recordings, wrote second summary
Went through exam protocols on this site, saw that Prof. Nöth focuses on the general understanding of the matter and not so much on the specific formulas. Also noticed that the focus on the exercises is quite low.
Wrote third summary out of first and second

Conclusion: Try to fully understand the basic concepts of each lecture and each exercise sheet. Focus on the relation between different Lectures as well (relation between Gaussian classifier, Naive Bayes and Nearest Neighbor).