Assignment 13 - predict

system · 16. Juli 2020 um 12:38

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

Horsccht · 16. Juli 2020 um 12:38

Assignment 13 - predict
Hi,

short question for Assignment 13:

in the provided file you provided the following code:

predict survival for test dataset

test_label = linreg.predict(train_data)

store labels in test_label.npy file which you should also submit in StudOn

np.save(“test_label.npy”, test_label, allow_pickle=True)

Is it on purpose that you predict the test labels by using the train_data?
If I got that right, we first calculate/train the weights with the training data set.
Then we use the predict function to predict the outcome for all other examples (where we do not know the outcome yet) with the weights we just trained.

Therefore I would call
test_label = linreg.predict(test_data)
to calculate the outcome for the test data set.

Opinions on this topic?
Thank you!

Best regards,

horscchtey

Horsccht · 16. Juli 2020 um 15:59

Hi,

can you create Assignment 13 in StudOn so I can hand in my solution?

Thank you very much!

oh89otoq · 17. Juli 2020 um 12:07

Yes, you are absolutely right, the test_label should be predicted based on the test_data!
Anyway, as this is a bug in the code template, you should not bother too much, we will have to take care of using the correct data during grading.

Also the Assignment in StudOn is now available.

riwo · 17. Juli 2020 um 13:00

Do mind however that any normalization of the data performed on train_set also has to be performed on test_set or you just updating the template during grading will break the implementation. Right now this normalization of test_set can easily be forgotten since the data is never used.

oh89otoq · 21. Juli 2020 um 09:15

Well, for this reason, I would as always, suggest to comment your code.

Let’s just recap the normalization idea. You compute some normalization constants based on your training set and you apply these constants to your trainig set, which will be perfectly normalized, and to your test set, which will normalize accordingly to how well your training set generalizes.
You have a training set and a test set provided, so I guess it should be clear how to normalize and not to leave out the test set.

IFF some student creates some weird way of normalizing the training data twice because it is used to predict the output labels, then you are right and we have to take care of it.
BUT since you could not upload your code to StudOn until this bug was posted to the forum, I would assume quite some students should have noticed this bug anyway.
Also you don’t have to do any normalization for linear regression since weight and bias will scale your values anyway.
Finally, if you just do not normalize the test set, you are doing the normalization wrong, which does not depend on this bug. It just might be a bit harder to notice because of it.