We'd like to use cookies on your device. You can accept our recommended cookies or customize your settings for better functionality.
We'd like to use cookies on your device. You can accept our recommended cookies or customize your settings for better functionality.

Marketing Data Science 101: Linear Regression

Marketer’s look for past trends to estimate the upcoming traffic on a website, or to decide what budget would fetch how much revenue or how much to invest in at a particular channel to gain the targeted sales but does this always give the right figure or number?

A big NO!! Trend analysis can just give you an estimated value, but how perfect it would be if the data scientist of your team can form a regression model to automate this process of generating the future prediction values and that too with minimal error in the estimated numbers. The regression analysis would not only make your decisions more reasoning based but also would guarantee better results in the long run. A simple affinity analysis (also known as a market basket analysis), wherein we analyze certain consumer behaviors’ co-occurrence, will give you details about what else this customer is likely to shop for. This is just an easy example to explain, the power of Machine Learning modelling is unimaginable. It helps in determining market opportunities and helps in tailor-making marketing strategies, what an amazing initiative would it be to make the most of the humongous data we already have.  

Types of Analysis through Machine Learning 

Let us begin with the very basis of data science, it is quite similar to how we humans learn from our experiences and learn for our future and this is refereed to as Machine Learning. Machine learning models can be classified into three types based on the output required: - 

  1. Regression: The output achieved is continuous variable, e.g. the sales prediction for the next quarter .

  1. Classification: The output here is a categorical value (either 0 or 1), e.g. classifying your emails as spam and ham.

  1. Clustering: Segmenting your lookalike customers based on their behavior or any other attribute present in the data.

Machine learning algorithm can be classified as follows - 

Supervised vs Unsupervised Learning
 Source: Dummies Notes Supervised vs. Unsupervised Learning 


Regression Analysis

Let’s share about the very basic algorithm of linear regression which would help you predict the future sales for your business. 

Linear Regression can be classified into two categories: 

  1. Simple linear regression 

  1. Multiple linear regression 

The target or sales value column/ feature is known as the target or dependent variable and all other features are called independent features. 

The simple linear regression model is based on the very basic linear equation where we explain the relationship between the independent and dependent variable. 

Intro to Linear Regression
Source: Medium - Data Driven Investor


Here Y axis shows sales or dependent variable and x axis shows your marketing budget or the independent variable 

The standard equation of linear regression is given by Y= mx+c 

Where ‘m’ is the slope and ‘c’ is the intercept of the straight line, based on which our model predicts the outcomes for y or the sales values. 

Now in-order to achieve our outcome with minimal error, we need to achieve the best fit line for our dataset, and it is denoted by ‘Best fit line’ for our model. 

In order to achieve the best fit line, we need to minimize the error or residual for each datapoint on the regression line where residual is denoted as e1= y (actual) – y (predicted)  

Now using the ‘Ordinary Least Squares Method, we have to minimize the RSS term: 

RSS (residual sum of squares) = e1**2 + e2**2 + e3**2 + …… en**2 

Here residual is nothing but the difference between the actual value of our dependent variable or sales and the predicted value of sales from our model. 

Hence the linear regression model helps to reduce to error term and predict the right sales value for your business. 

In case of multiple linear regression, the number of independent features/columns would always be greater than one whereas dependent variable remains one always. 

Assumptions of linear regression

Now in order to follow the straight-line path in linear regression and form a best fit line for our model, there are certain assumptions to linear regression, which are as follows: - 

  1. There is a linear relation between X (independent) and Y (dependent) variable 

  1. Error terms are normally distributed or form a bell-shaped curve 

  1. Error terms are independent of each other 

  1. Error terms have a constant variance (homoscedasticity) 

    Assumptions of Simple Linear Regression- What all assumptions on the error terms is this image referring to?
    Source : ProgramsBuzz


Now in our marketing world there would be n-number of factor or independent variables which would actually impact our sales values and hence the multiple linear regression would be the better fit and also all the assumptions are also followed in multiple linear regression and hence produces an optimum output sales value with a data backed decision!! 

At Merkle Sokrati, we provide a holistic approach in understanding your requirements and connecting them with the best possible measures. We are happy to provide you services like Data Management Platforms, Analytics Platforms, Web Personalization Platforms, Tag management services etc.   

If your business needs help in utilizing the best out of Performance Analytics, we’re here to help.