How to use feature group?

Question

Let's say I have a data set like the following:

file group_a_co_1 group_a_co_2 group_b_co_1 group_b_co_2 file_1 0.8 0.2 0.3 0.7 file_2 0.1 0.9 0.2 0.8 file_3 0.5 0.5 0.7 0.3 ...

I wonder, whether there are ways/tricks to tell the model about the group information here: since group_a_co_1 + group_a_co_2 = 1 and the same goes for group_b. Somehow I figure if I expose the group information, the performance of my model will improve.

score 1 · Answer 1 · answered Dec 12 '19 at 12:03

1

The information in groups 'group_a_co_2' and 'group_b_co_2' are already redundant; they do not add more information to the model. Therefore they can be removed. Adding even more redundant information will not improve your model further.

answered Dec 12 '19 at 12:03

Geert Immerzeel

166
5

So you mean I just need group_a_co_1 and group_b_co_1 these two columns? – dgg32 Dec 12 '19 at 12:07
Yes, you will solve multicollinearity – Syenix Dec 13 '19 at 20:03

How to use feature group?

1 Answers1