Du befindest dich hier: FSI Informatik » Prüfungsfragen und Altklausuren » Hauptstudiumsprüfungen » Lehrstuhl 5 » pr-2018-02-28

**Examiner:** Prof. Nöth

**Atmosphere:** Friendly. I was a bit early and he started with some jokes that there will be more time for the exam then. When I started to make a face, he said that even 30 minutes will be enough to torture me.

**Questions:**

K Nearest Neighbor (from exercises)

- Q: (He drew a coordinate system with some points) What step is necessary before applying KNN itself?
- A: Normalize the data, e.g. to [-1, 1].
- Q: How does it work?
- A: Calculate the distances from a point to all others and take the k nearest neighbors. Then find the class that is represented in those neighbours the most. This is the class of the new point.
- Q: Calculations will take very long (multiple for-loops). How did we end up solving this in the exercises?
- A: We used matrix calculation to get rid of some for-loops. To solve the problem that matrices may get to big for memory, we seperated the calculations in chunks.

Bayes

- Q: Formulate the Bayes rule and name all parts.
- Q: Relation to KNN:
- A: Error is at most twice of Bayes (not the classifier) Error Rate (see http://cseweb.ucsd.edu/~elkan/151/nearestn.pdf for more info).
- Q: We talked about optimality of a classifier…
- A: A classifier is optimal with respect to something. Bayes for example is optimal with respect to the 0-1 loss function.

Confusion Matrix

- Q: (He drew a confusion matrix similar to the ones in other braindumps) Does this classifier optimize for the Average Loss?
- A: No, the FN and FP are different by a magnitude.
- Q: Imagine that this classifier is about a sickness. What class do you think is the one with the sickness? How many of those are actually sick?
- A: Hopefully the class with fewer members.

Viola Jones & Boosting

- Q: Imagine I'm a manager of a hospital. Now you want to apply some sort of Voila Jones classifier to classify sick patients. How would you explain this to a manager?
- A: (He specifically wanted to NOT hear buzz words). Use multiple methods in a row. The first one will be a cheap one that will probably mark a lot of the patients as sick even though they aren't. Only the ones that are marked as sick will get tested with follow-up tests. The following tests will be more expensive and time consuming but give a clearer picture whether a patient really is sick.
- Q: How does Boosting methods work in general?
- A: (Write and explain formulas)
- Q: Whats the values the error of a classifier can be?
- A: 0 when every sample is classified correctly and 1 when no sample is classified correctly.
- Q: How does this affect the classifier weights?
- A: When everyting is correct, weights will go to +infinity, when everything is classified wrongly, weights will go to -infinity.

Support Vector Machine

- Q: Formulate the optimization problem of hard margin and soft margin SVM.
- Q: What is alpha?
- A: The normal vector of the hyperplane.
- Q: What is the cost function in the case of soft margin?
- A: Costs depends on the slack variables. The costs will be higher the higher the slack variables are as slack variables will move the samples inside the margin onto the margin border. If a sample is outside of the margin, costs will be zero. If it is inside the margin, the costs it will be the length to the margin. All in all the cost function looks something like a dead zone and a L1 norm.