Q: What is the optimal classifier?
A: Had no idea what he meant with optimal. I referred to the Bayesian classifier with respect to the 0-1-loss function. He was still waiting for the optimal classifier, so I explained the formulas of the Bayesian and transformed the equation to argmax {log p(y) + log p(x|y)}. That seemed to be what he wanted to hear.
Q: How can you calculate the priors?
A: Quite simply based on the relation of the classes in the training data
Q: What property must hold so that you can do this?
A: Training data have to be representative
Q: Regarding the blood-pressure / body temperature problem, is the 0-1-Loss function the right one?
A: No, because predicting healthy for an actually sick person would be fatal, so we need an asymmetric loss function.
Q: How can you apply Bayes to the problem?
A: We can assume that the features are normally distributed → apply Gaussian classifier.
Q: How can you reduce the parameters of the Gaussian Classifier?
A: Assume independency and use Naive Bayes. (not what he wanted to hear)
A: Use Naive Bayes with first order dependency. (not what he wanted to hear)
A: Use LDA to get the covariance matrix to be the identity. (not what he wanted to hear)
A: Apply PCA. (☺☺☺)
Q: Explain PCA:
A: Projecting features into a lower-dimensional subsp…
Q: No, not necessarily
A: Subspace can also have the same dimension. But with PCA, the features are normally distributed and independent.
Q: What property has to hold so you can apply Naive Bayes with First Order Dependencies?
A: Features have to be only related to the feature before them.
Q: For example?
A: Time series (temperature of a day).
Q: What do you do, when you cannot assume a Gaussian distribution?
A: Gaussian Mixture Model with EM Algorithm
Q: Can you compute / wisely choose k in advance?
A: No.
Q: Explain the EM-Algorithm
A: Explained by drawing a mixture of two Gaussians.
Q: What is p_k and p_ik?
A: Explained both
Q: How can you call p_k?
A: The contribution of the Gaussian to the GMM. (not what he wanted to hear)
Q: How else?
A: The weight of the Gaussian for each point. (not what he wanted to hear)
Q: How else?
A: No idea. → He told me it is also called the a-priori probability.
Q: How do you update the parameters?
A: Explained at the function i drew, which lead to misunderstandings. Then I just wrote down the formulas for the mean and the p_k and he was happy.