Once you have designed and finished your product. Would you like to know how probable a given prospect will buy your product before investing sales efforts in him? Would you value knowing an estimate of the people with a high probability of buying your product in a given segment?

The Logistic Regression allows you to classify a person or entity into groups based on their characteristics. For many marketing problems, these groups are buyers and no buyers. In the case of the sales department, they usually have a group of leads that may have shown some interest in the product. However, they may want to contact only those with the highest probability of buying given the characteristics of the company. On the other hand, the marketing department may want to evaluate the potential buyers’ proportion for different segments to assign marketing resources.

The Logistic Regression also allows you to know the relative weight of the characteristics taken into account when you classify the prospects. Let’s think about the following example: ready to eat pork that only needs to be warmed up. One variable that will determine the buying decision is the consumer’s occupation. Those consumers more busy with work will look for products that they can cook faster. Members of certain religions won’t buy the product. What about people with fitness awareness. Will they buy the product? May be the pork has more fat than other meats, but they still value the flavor. And what about people with health conditions. In this case, the fat may have more influence in the decision. With the logistic regression, we intend to give a weight to these and other variables, add them to the evaluation and get a conclusion about the probability of buying the product.

The formula for the model is the following:

In this equation:

p = probability of buying the product (dependent variable)

xi = Independent variable (occupation, religion, fitness awareness, health condition, … )

β0 = constant

βi = parameter that indicates the variable xi’s importance

p > 0.5 indicates that the person will buy the product. The closer that p is to 1, the higher is the person’s probability of buying the product.

To find the parameters, it would be necessary information from a sample of prospects and a statistical package (SPSS, SAS, R). You can test the variables’ relevance and evaluate the goodness of the model with the sample. You can measure the prediction’s goodness with the percentage of correct predictions in relation to the total sample. You also can evaluate it comparing predictions and results of future buying behavior of prospects.

The advantages of the logistic regression compared with other classification techniques are:

1. There is no need of linear relationship between the probability and the variables.

2. The dependent variable doesn’t need to be normally distributed.

3. The independent variables can be intervals, ranking, nominal (names).

4. The independent variables don’t need to be normally distributed.

5. The group sizes of buyers and no buyers (dependent variable) can be different.

6. The means and variance of the dependent variable can be different.

7. The error doesn’t need to be normally distributed.

The use of fewer assumptions allows predicting a bigger number of situations.

The purpose of this article is to discuss the model’s application to the prospects’ qualification. However, the data structure is similar to problems in other environments. I performed a couple of tests to evaluate the model. The first test was about the service’s evaluation in a library. The dependent variable is the level of satisfaction (very satisfied or not very satisfied) and the independent variables are the evaluations given to different aspects of the library. I took answers with probability over 0.5 for the very satisfied group and below 0.5 for the other group. I obtained the predicted probabilities for the sample once that I estimated the parameters and I found that in 71% of 390 cases the model gave the correct prediction.

In the same problem, I applied the model to 390 different cases from the same interviews (these cases were not used to estimate the parameters of the model) and I found that 72% of the predictions were correct.

The second test is the admission in graduate school. In this model, if the probability is higher than 0.5, the candidatewill be admitted. The variables are GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate school.

After estimating the parameter βs, I run the model over the sample and I found that in 71% of the 400 cases the model gave the correct prediction.

If we modify the cutoff probability and we make it 0.7 and over for the “success” or “admitted” and less than 0.3 for the “failure” or not admitted, the percentage of correct predictions increase. For this test, the percentage increased to 79%.

Is it interesting to you a tool that allows you to identify prospects with a higher probability of buying taking into account the multiple factors related to them? If your answer is yes, you will need prospect’s data and few hours of analysis. Contact Us.

Your post has moved the debate fowrrad. Thanks for sharing!