Assignment 10

entropyGain

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

Assignment 10
Hi,

I don’t really get the point what “List assumes” actually should contain and the only information we have * does not really help either imo if its about to compute the entropyGain. Could someone may give me an example?

Wouldn’t it make more sense to give the “entropyGain” method all TableRows instead?

Do we actually have to implement it this way by using entropyGain as it defined in the DesicionTable Class, or can I just use my own entropyGain that works on TableRows instead of AttVals?

Best
Fuxicus

PS: might be stupid questions :smiley:

  • “TODO should return Gain(attr) (for “target” as query-attribute) for the node in the tree where all “assumes” is satisfied”
1 „Gefällt mir“

It’s exactly what you did, f you did the last exercise sheet. You had to pick the attribute with the best information gain (let’s call this attribute A) and then
“split” the table on the values of A. In the next step you would first look at all entries (rows) where A=val1, then entries where A=val2 and so on.
If you do this a couple times you have a branch where A=val1, B=val1 and C=val2, and these assignment is represented here by “assumes”.

Please tell me if that makes it clear, I could also give a more specific example.

3 „Gefällt mir“

Sure, I have to look at all attributes/columns in the given table at the start and it will narrow down as I’m done branching for a given attribute.

But again, I still fail to map the [m]assumes[/m] Variable into the formula. I’d preferably see everything column-wise as I count the occurences for given values of my target. In the context of columns and TableRows what exactly is meant with [m]assumes[/m]?

Here’s the formula in the slides:

Here’s the header of the method to be implemented:
[m]double entropyGain(Attribute attr, List assumes, Attribute target)[/m]

  • the first summand (information/entropy) can be calculated via the [m]target[/m] (in other words, it’s the entropy of the target (with respect to the remains of the TableRows))
  • for the second summand, I would look at the [m]attr[/m] Variable
  • what am I missing?

Now this took me a while to understand but thankfully I got it now:
Basically you do not update your global member variable [m]rows[/m]. However you’ll need the updated information for efficiently calculating the information gain.
Therefore you just pack all the assignments you do along one path in that assume list and then filter the information of [m]rows[/m] out that does not match the criteria given by [m]assumes[/m].

What are you missing?

You’ll need the assumes list to get the „remaining“ table rows as afaik it was not intended for you to modify the global member variable [m]rows[/m] along the way.


Thank you for your explanation. I kind of understand the concept now, I hope. I am certain that this argument is supposed to make the implementation more convenient.
I still don’t get its contents though. As it’s the remaining rows (well partly?), at this point I am only interested in the column of the [m]attr[/m].
So the [m]assumes[/m] ([m]List[/m]) may look like this for a first call when focusing on Wind: [“Wind:=Weak”, “Wind:=Weak”, “Wind:=Weak”, “Wind:=Strong”, “Wind:=Weak”]
If that’s correct, it’s kinda confusing to use as a single attribute name is the same for the whole column (always) and a [m]String[/m] [m]ArrayList[/m] would’ve done the job just fine.
(Sorry if I sound nitpicky, I just want to make sure I don’t mess up big time…)


Nope you did not get my explanation, maybe it was too unclear,
for example in your second step you have taken the assumption that [m]Wind:=Weak[/m] for calculating your subtreee. then [m]assumes[/m] just contains that very assignment and nothing more! It is for filtering your rows not columns.


Thanks, I got a lot further now! Any suggestion for constructing the tree?
I thought about recursion, but I ended up with a top-down loop which inserts dummy leafs as the given classes require you to always construct a full tree with leafs (bottom-up)…


Recursion works wonderfully. The base case creates leafs and the recursive calls return the subtrees to the current subtree.