Du befindest dich hier: FSI Informatik » Prüfungsfragen und Altklausuren » Nebenfächer » Braindump Business Intelligence SS 2020 (Übersicht)

Braindump Business Intelligence SS 2020

Entnommen aus: https://pad.stuve.fau.de/p/businessintelligencess2020_2

Zeit: 90min; wirkt ausreichend, aber man sollte auf die Uhr gucken.

Anzahl Seiten: 18(!)

1. Preprocessing

Given dataset with 5,050 examples, shall predict whether students pass exam or not

a) explain what preprocessing is necessary for using a nueral network on that data set (conversion into numeric, missing values, etc)

I. What is the problem with a data set that has label „passed“ = 5,000 and „not passed“ = 50

II. Is accuracy a good metric here?

Solution: no,imbalanced class ratio

c) Your boss wants to do the following tasks:

1. Calculate the profit in the future

2. Put employees in pre-definded groups

Which technology can he use that is able to solve BOTH problems.

(In other words: name 2 models that can do regression AND classification)

2. Evaluation

Confusion matrix about people spending a lot in an online shop.

a) calculate precision, recall and F1 score

b) argue what should one choose to classify customers that will likely review my product positively, so that I will send them my product for free (precision!)

c) Your model shows a low training error and a high validation error error. What might be the issue? What can you change to fix it?

Solution: overfitting

d) given 3 ROCs, which is better

3. Decision Trees

a) read classification from a given tree: What two groups are targeted? (Decision tree has leafs „Will respond to campaign“ Yes, No)

b) what to do if tree is too complex

c) given two trees, which is better

4. Neural Networks

Given a Neural Network

a) compute activation potential and activation value

b) calculate error signal and new weight

c) explain back propagation

d) explain black box property

5. SVMs

(SVM x-y-coordinate system with samples from two classes is shown. There is no boundary drawn. Although both classes are mostly separated to either left or right, one sample (which looks like an outlier) is placed in the middle, but is part of one of those classes.)

a) What happens if point x (= one support vector) is erased from data set?

Solution: Boundary travels toward the class which had the outlier.

6. Social Media Mining

a) What is the difference between Social Media Mining and Social Media Analytics?

b) see two WoM values, explain which shows better result of a marketing campaign

(it's not mentioned whether the marketing campaign is intended to result in more direct clicks or clicks through recommendation)

c) Difference between centrality and centralization

Solution: Centrality refers to the position of an individual actor. Centralization characterizes the total network.

d) Calculate closeness centralization of network

e) Argue which network is better! (all centrality and centralization measures (closeness centralization is from the last step) and the actual networks are given)

7. Association rules

10.000 shoes sales were tracked. Left side: Single-shoe-pair occurence (in basket) in percentage. Right side: Two shoe pairings occurence (in basket) in numbers. We are looking at Speedrunner, Endurance and Fighter (names of shoes).

a) Find the four 1-to-1-itemset- association rules ({A} → {B}) from given data. calculate support & confidence.

b) Describe the steps for the FP-Growth algorithm.