I have a model in which we want to optimize the probability of an outcome depending on a election of some product (a personalized product for every client amongst three posibilities). The product is the second product elected after the same client has a first.
I have never done an optimization of a probability with analytical models before so I am a little blind in the topic.
What I have for now is this:
- I have a dataset with clients with ONE product, the optimization is intended for THEM, they have a first product and I want to model which is the next best product for every one of them - Application dataset.
- I have another dataset with clients with TWO OR MORE products, the training could be done with them (I know which product was purchased first) - Training dataset. Both datasets have exogenous variables which could help modelling: i.e. Age, gender, etc.
I have thought these posible ways to achieve the objective:
Split the training dataset depending on the first purchased product, then obtain a personalized probability on the basis the first product is purchased, the "sign" and coefficient of the second possible products would tell me which is the next best one for every first product (subset).
Model the dataset without separating anything, read the results looking for optimal combinations.
Model with interactions first-second products to look for the best combinations.
Model with white-box techniques (decision tres, logistic regression), so the combinations are traceable.
Something different.
My sample is think is large enough: 700.000 cases for people with ONE product.
40.000 aprox for people with TWO or MORE products
– Juan Esteban de la Calle Feb 09 '20 at 17:27