E-Commerce: Predicting Customer Purchase Incidence and Revenue
- Evie Wei
- Jan 24, 2021
- 4 min read
This project studied e-commerce website data over a period of one year. The objective was to gain insights into customer characteristics and behaviors and predict the transaction revenue of customers.
We used SAS, R, and Excel to conduct our analysis. SAS and R helped us provide descriptive statistics of variables for customer characteristics and behaviors. R enabled us to find the best fitting model to predict revenue and describe the data. Excel provided us with an overview of the large online-store dataset consisting of 452027 observations and 45 variables (including nested variables).
We cleaned the data by excluding variables with a large number of missing values and created new variables based on the original dataset. This process included categorizing sources, browsers and mediums to reduce levels of these variables, converting the time into the local time and creating variables based on it. This provided us with a cleaned dataset of 25 variables. We used session-level observations to conduct the analysis and built the prediction model in two stages.
First, we had a basic picture of the users. The majority of consumers used the online website on their desktop or tablet, used the browser Chrome, and found the website through google or one of their services; additionally, 43% of customers live in North America; a large amount of traffic occurs in the morning and afternoon, from Monday to Friday, and from October to December.

Then we decided to conduct a two-stage progress to do the analysis. For the first stage, we chose the binary logit model to predict the purchase incidence of each observation. For this, we only used sessions that did not have a bounce, since sessions with bounces cannot lead to a transaction.

We found that around 1.3% of sessions would lead to a purchase and 2178 sessions were more than 50% possible to make a purchase. Three important factors that have a greater impact on the incidence of purchase were:
The purchasing odds of US users were 14.67 times of non-US users. However, this effect got weaker when the page views increased in a session due to a significant interaction.
Returning visitors' purchasing odds were 13.68 times of new visitors' odds. The effect also got weaker when the page views increased.
The odds of purchasing for mobile users were 51.3% lower than users through desktop.
Besides, users were more likely to purchase on Friday, 8am-4pm, April-June and got to the website directly by using Google Chrome.

For the second stage, we used the log-log regression model to predict the purchase amount of each observation. To predict the purchase amount, we assumed that the purchase probability was one. For this model, we only included sessions that had a transaction revenue to have a dependent variable that is normally distributed. We then applied the models to the test dataset to get the predictions for the purchase incidence and purchase amount.

Based on this model, our findings were summarized in the following picture. From the bubbles, we can see that:
Compared to that of returning visitors, purchase from new visitors would decrease the company’s revenue by 23%.
Regarding browsers, revenues from Chrome were 104% higher than those from Android and Chrome was the browser that created the highest revenue.
Regarding time, purchasing on weekends would decrease the revenue by 135% compared to purchasing on weekdays.
Purchases made in both Quarter 2 & 3 (April-September) would make the company gain140% higher revenues, compared to those in Quarter 1(January-March).
Revenues would increase 20% if purchases were made from 8am to 4pm, comparing to those from 4pm to 12am.
From the bar chart, we can see that:
If a session started from a direct source, the effect of an increase in page views would be 27% higher.
If a session was from the US, an additional hit per page would increase the effect of an additional hit per page view by 135%.
Basically, more hits we get from the users, the more gain we could have.

In order to obtain our final prediction at the customer level, we multiplied the purchase incidence prediction by purchase amount prediction at the session level. Sessions with a bounce were assigned zero transaction revenue as these sessions cannot lead to a transaction. Also negative predictions were assigned zero as a value. In the final step we aggregated all sessions by customer to receive our final prediction on a customer level.
On the one hand, we found three main drivers of purchase incidence: sessions from the US, sessions from returning visitors, and sessions from non-mobile devices. On the other hand, the highest revenue was generated if the website was accessed from Google Chrome, between 8am and 4pm and in either the second or third quarter of the year.
Based on these insights, we recommend the following three points to the company. First, optimize the website to generate more hits and more revenue from mobile devices. Second, advertise the online store during the time of day and quarter in which generate a high transaction revenue. Third, target returning visitors as they are more likely to buy the products and their revenue per purchase is higher. These recommendations should help the company to increase revenue and achieve success in the future.

Comments