Multinomial Logistic Regression

I use Cars dataset to apply multinomial logistic regression.

#load the libraries that we need.
library(data.table)
library(ggplot2)
library(ggridges)
library(nnet)
library(stargazer)
library(rpart)
library(rpart.plot)
library(parttree)

#read and assign the data. Please adjust the directory where the data lives.
dt_cars <- fread(file = "cars.csv", sep = ";")

#having a quick look into the dataset.
head(dt_cars)
                     VehicleName Standard SportsCar   SUV Wagon Minivan Pickup
                          <char>    <int>     <int> <int> <int>   <int>  <int>
1:              Acura 3.5 RL 4dr        1         0     0     0       0      0
2: Acura 3.5 RL w/Navigation 4dr        1         0     0     0       0      0
3:                     Acura MDX        0         0     1     0       0      0
4:  Acura NSX coupe 2dr manual S        0         1     0     0       0      0
5:          Acura RSX Type S 2dr        1         0     0     0       0      0
6:                  Acura TL 4dr        1         0     0     0       0      0
     AWD   RWD RetailPrice Dealer Cost EngineSize   Cyl    HP CityMPG HwyMPG
   <int> <int>       <int>       <int>      <num> <int> <int>   <int>  <int>
1:     0     0       43755       39014        3.5     6   225      18     24
2:     0     0       46100       41100        3.5     6   225      18     24
3:     1     0       36945       33337        3.5     6   265      17     23
4:     0     1       89765       79978        3.2     6   290      17     24
5:     0     0       23820       21761        2.0     4   200      24     31
6:     0     0       33195       30299        3.2     6   270      20     28
   Weight WheelBase Length Width
    <int>     <int>  <int> <int>
1:   3880       115    197    72
2:   3893       115    197    72
3:   4451       106    189    77
4:   3153       100    174    71
5:   2778       101    172    68
6:   3575       108    186    72
#create a "Drive" variable, to indicate the type of the drive. The command first creates the Drive variable and assign FWD to all the observations, since the data doesn't have a variable for FWD, unlike AWD and RWD. Then change the Drive variable for cars that have AWD or RWD, then the command changes the Drive variable to factor variable.
dt_cars[, "Drive" := "FWD"][AWD==1, "Drive" := "AWD"][RWD==1, "Drive" := "RWD"][, "Drive" := as.factor(Drive)]

\(\bullet\) The weight distribution of cars by drive type:

#the weight distribution by drive type. The code will generate a boxplot for each drive type.
plot(dt_cars$Drive,dt_cars$Weight,main="Weight distribution of cars by drive type",xlab="Drive",ylab="Weight")

#the same as before, but with colors added.
plot(dt_cars$Drive,dt_cars$Weight,col=factor(dt_cars$Drive),main="Weight distribution of cars by drive type",xlab="Drive",ylab="Weight")

#using ggplot to create a histogram for the distribution of weight by drive type with a unique color for each drive type.
ggplot(dt_cars,aes(x=Weight,color=Drive))+
  geom_histogram(bins = 40)

#The previous plot in vertically stacked facets using ggplot.
ggplot(dt_cars,aes(x=Weight,color=Drive))+
  geom_histogram(bins = 40)+
  facet_grid(~Drive)

#The previous plot as a ridgeline plot
ggplot(data = dt_cars, aes(x=Weight, y=Drive, fill = Drive)) +
  geom_density_ridges(bandwidth=30) +
  scale_colour_brewer(type = "qual", palette = 1) #the "qual" stands for qualitative

#The previous plot as a ridgeline plot, density plots, with median weights for each drive type
ggplot(data = dt_cars, aes(x=Weight, y=Drive,vline_color=Drive)) +
  stat_density_ridges(quantile_lines = T,quantiles = 2) #this line will calculate the median of weight for each drive type. First, the quantile lines have to be shown, so it's set to be True, and then the "quantiles" will set the number of quantiles the data should be broken into, and since the median represents the 50th percentile, then the data will be broken into two quantiles. To color each vertical line, I added "vline_color" before, in the aesthetics in ggplot, so that the vertical lines will have different colors based on the drive type.
Picking joint bandwidth of 236


\(\bullet\) Multinomial Logistic (MNL) Regression:

#make "RWD" the reference class. The "relevel" will reorder the Drive variable by the reference, RWD.
dt_cars$type<-relevel(dt_cars$Drive,ref="RWD") 

#run multinomial logistic regression, using the new variable created before.
model<-multinom(type~HP,data=dt_cars)
# weights:  9 (4 variable)
initial  value 468.008835 
iter  10 value 378.849870
final  value 378.849859 
converged
#results
stargazer(model,digits = 4,type="html")
Dependent variable:
AWD FWD
(1) (2)
HP -0.0054*** -0.0194***
(0.0021) (0.0023)
Constant 1.1774** 4.9953***
(0.5261) (0.5279)
Akaike Inf. Crit. 765.6997 765.6997
Note: p<0.1; p<0.05; p<0.01

The coefficients are interpreted relative to the reference. That is, for AWD, as horse power increases by one unit, the coefficient for AWD relative to RWD will decrease by 0.005, which means if the horse power increases by one, the chances of having RWD vehicle are higher compared to AWD.

The results are significant at the 1% level.

\(\bullet\) Using the model before:

\[\begin{equation} x^{\prime}\beta^{AWD} = 1.177-0.005x \end{equation}\]

\[\begin{equation} x^{\prime}\beta^{FWD} = 4.995-0.019x \end{equation}\]


\(\bullet\) Predicting AWD and FWD class probabilities, and calculating the RWD class probability:

The multinomial logistic regression model is a log-linear model, such that:

\[\begin{equation} ln [\frac{\mathrm{P}(Drive=AWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)}] = 1.177-0.005x= x^{\prime}\beta^{AWD} \end{equation}\]

and: \[\begin{equation} ln [\frac{\mathrm{P}(Drive=FWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)}] = 4.995-0.019x = x^{\prime}\beta^{FWD} \end{equation}\]

The denominator is the reference we specified before. That is, the coefficients are interpreted relative to the reference, RWD.

Then, if we exponentiate each side, we have: \[\begin{equation} \frac{\mathrm{P}(Drive=AWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)} = e^{1.177-0.005x} = e^{x^{\prime}\beta^{AWD}} \end{equation}\]

\[\begin{equation} \frac{\mathrm{P}(Drive=FWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)} = e^{4.995-0.019x} = e^{x^{\prime}\beta^{FWD}} \end{equation}\]

In R , we can exponentiate the intercept and the coefficients:

exp(coef(model))
    (Intercept)        HP
AWD     3.24581 0.9946333
FWD   147.71311 0.9807867

Now, we can write: \[\begin{equation} \frac{\mathrm{P}(Drive=AWD \mid x)+\mathrm{P}(Drive=FWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)} = e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}} \end{equation}\]

Since the class probabilities have to sum up to \(1\), that means: \[\begin{equation} \mathrm{P}(Drive=AWD \mid x) + \mathrm{P}(Drive=FWD \mid x) + \mathrm{P}(Drive=RWD \mid x) = 1. \end{equation}\]

Then: \[\begin{equation} \frac{1 - \mathrm{P}(Drive=RWD \mid x)}{\mathrm{P}(Drive=RWD \mid x)} = e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}} \end{equation}\]

Then if we add \(1\) to both sides, we have: \[\begin{equation} \frac{1}{\mathrm{P}(Drive=RWD \mid x)} = 1+e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}} \end{equation}\]


\[\begin{equation} \therefore \mathrm{P}(Drive=RWD \mid x) = \frac{1}{1+e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}}} \end{equation}\]

Hence, we have: \[\begin{equation} \mathrm{P}(Drive=AWD \mid x) = \frac{e^{x^{\prime}\beta^{AWD}}}{1+e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}}} \end{equation}\]


\[\begin{equation} \mathrm{P}(Drive=FWD \mid x) = \frac{e^{x^{\prime}\beta^{FWD}}}{1+e^{x^{\prime}\beta^{AWD}}+e^{x^{\prime}\beta^{FWD}}} \end{equation}\]

In R , we can predict the probabilities for each observation:

prediction<-predict(model, dt_cars,type="prob") #type here is for probability.
head(prediction)
        RWD       AWD       FWD
1 0.2600668 0.2515229 0.4884103
2 0.2600668 0.2515229 0.4884103
3 0.3781879 0.2949303 0.3268818
4 0.4517007 0.3079192 0.2403800
5 0.1939238 0.2145603 0.5915159
6 0.3931465 0.2984564 0.3083971

The probabilities must sum to \(1\). (e.g. each row has to sum up to \(1\)).

For logistic regression, the fitted values represent the probabilities, not the expectations, so, alternatively, we can use the fitted values of the model to get the same probabilities for all observations.

head(fitted(model))
        RWD       AWD       FWD
1 0.2600668 0.2515229 0.4884103
2 0.2600668 0.2515229 0.4884103
3 0.3781879 0.2949303 0.3268818
4 0.4517007 0.3079192 0.2403800
5 0.1939238 0.2145603 0.5915159
6 0.3931465 0.2984564 0.3083971

\(\bullet\) Example:

Assume we want to predict the probabilities for each drive type if the horse power equal to \(200\).

The probability that the vehicle will have AWD drive type, conditional on the horse power equal to \(200\) will be: \[\mathrm{P}(Drive=AWD \mid HP=200) = \frac{e^{1.1774-0.0054*200}}{1+e^{1.1774-0.0054*200}+e^{4.9953-0.0194*200}}\] \[\approx 0.2139\]

We can calculate the probability for the other two classes: \[\mathrm{P}(Drive=FWD \mid HP=200) = \frac{e^{4.9953-0.0194*200}}{1+e^{1.1774-0.0054*200}+e^{4.9953-0.0194*200}}\] \[\approx 0.592 \]

\[\mathrm{P}(Drive=RWD \mid HP=200) = \frac{1}{1+e^{1.1774-0.0054*200}+e^{4.9953-0.0194*200}}\] \[\approx 0.1941\]

The sum of the probabilities is: \(0.2139+0.592+0.1941=1\).

That means, given a vehicle with a horse power of \(200\), the probability that it has AWD or FWD or RWD is approximately \(21\%\), \(59\%\) or \(19\%\), respectively.

We can verify that using R :

predict(model, dt_cars[HP==200],type="prob")
         RWD       AWD       FWD
1  0.1939238 0.2145603 0.5915159
2  0.1939238 0.2145603 0.5915159
3  0.1939238 0.2145603 0.5915159
4  0.1939238 0.2145603 0.5915159
5  0.1939238 0.2145603 0.5915159
6  0.1939238 0.2145603 0.5915159
7  0.1939238 0.2145603 0.5915159
8  0.1939238 0.2145603 0.5915159
9  0.1939238 0.2145603 0.5915159
10 0.1939238 0.2145603 0.5915159
11 0.1939238 0.2145603 0.5915159
12 0.1939238 0.2145603 0.5915159
13 0.1939238 0.2145603 0.5915159
14 0.1939238 0.2145603 0.5915159
15 0.1939238 0.2145603 0.5915159
16 0.1939238 0.2145603 0.5915159
17 0.1939238 0.2145603 0.5915159

The manual calculation gives accuracy up to two decimal places.


\(\bullet\) Plot the horse power against the class probabilities:

#first we merge the probabilities with the dataset.
dt_cars<-cbind(dt_cars, probability = fitted(model))

#have a look on the variables.
ls(dt_cars)
 [1] "AWD"             "CityMPG"         "Cyl"             "Dealer Cost"    
 [5] "Drive"           "EngineSize"      "HP"              "HwyMPG"         
 [9] "Length"          "Minivan"         "Pickup"          "RWD"            
[13] "RetailPrice"     "SUV"             "SportsCar"       "Standard"       
[17] "VehicleName"     "Wagon"           "Weight"          "WheelBase"      
[21] "Width"           "probability.AWD" "probability.FWD" "probability.RWD"
[25] "type"           
#plot horse power against the 3 class probabilities.
ggplot(dt_cars,aes(x=HP))+
  geom_line(aes(y=probability.AWD,color="AWD"))+
  geom_line(aes(y=probability.FWD,color="FWD"))+
  geom_line(aes(y=probability.RWD,color="RWD"))+
  labs(y="Probability") +
  scale_color_manual(name="Drive",values=c(AWD="red",FWD="blue",RWD="black"))+
  facet_grid(type~.) #writing the name of the variable before "~" will create the grid in a horizontal format.



\(\bullet\) Classification Tree:

#classification tree for Drive on HP.
model_tree<-rpart(Drive~HP,data=dt_cars)

#plot the tree.
prp(model_tree)

#plot with HP on the x-axis, and its density on the y-axis, with vertical lines that indicate the splits, and the area between the splits are colored by the drive type.
ggplot(dt_cars,aes(x=HP))+
  geom_density()+
  geom_parttree(data=model_tree,aes(fill=Drive),alpha=0.5)+
  labs(y="Density")


\(\bullet\) Comparing between the classification tree chart and the MNL chart:

In the classification tree chart, the first region (lower levels of horse power) is green, representing FWD. That is consistent with the predicted probability in the MNL chart: the blue line in the MNL chart (FWD), for the low levels of horse power, is higher than the other drive types (e.g. higher probability).

In the classification tree chart, the last region (higher levels of horse power) is blue, representing RWD. That is consistent, as well, with the predicted probability in the MNL chart: the black line in the MNL chart (RWD), for higher levels of horse power, is higher than the other drive types (e.g. higher probability)



\(\bullet\) Regression Tree:

#regression tree for CityMPG on HP.
reg_tree<-rpart(CityMPG~HP,data=dt_cars)

#plot the tree.
rpart.plot(reg_tree)

The average CityMPG for the whole data is \(20\).

For horse power greater than or equal \(141\), the average CityMPG is \(19\), and that represents \(85\%\) of the data, and so on.

As we go down in the tree, the conditions become stricter, till it ends with five leafs at the end, which combined represents \(100\%\) of the data.

  1. If \(HP \ge 273\), the average CityMPG = \(16\), and this represents \(21\%\) of the data.

  2. If \(183 \le HP < 273\), the average CityMPG =\(19\), and this represents \(46\%\) of the data.

  3. If \(141 \le HP <183\), the average CityMPG = \(21\), and this represents \(18\%\) of the data.

  4. If \(113 \le HP <141\), the average CityMPG=\(27\), and this represents \(11\%\) of the data.

  5. If \(HP<113\), the average CityMPG=\(34\), and this represents \(4\%\) of the data.

We can see the conditions using rpart.rules

rpart.rules(reg_tree)
 CityMPG                      
      16 when HP >=        273
      19 when HP is 183 to 273
      21 when HP is 141 to 183
      27 when HP is 113 to 141
      34 when HP <  113       
#extract the average CityMPG from the leafs, by extracting the data frame for the fitted "rpart" tree. 
reg_tree$frame
     var   n  wt        dev     yval  complexity ncompete nsurrogate
1     HP 412 412 11214.9199 20.09951 0.502531728        0          0
2     HP 350 350  2548.8571 18.54286 0.068468519        0          0
4     HP 274 274  1441.5803 17.76277 0.032789694        0          0
8 <leaf>  85  85   324.8941 16.03529 0.001992630        0          0
9 <leaf> 189 189   748.9524 18.53968 0.001957478        0          0
5 <leaf>  76  76   339.4079 21.35526 0.007392736        0          0
3     HP  62  62  3030.2097 28.88710 0.064373128        0          0
6 <leaf>  44  44   427.1591 26.70455 0.004512745        0          0
7 <leaf>  18  18  1881.1111 34.22222 0.010000000        0          0
#From it, we can create another data frame that contains the data points of the leafs and "yval". 
#"yval" is for y-values, which are the the fitted values of CityMPG/the average of CityMPG.
City<-data.frame(CityMPG=c(reg_tree$frame[4,5],reg_tree$frame[5,5],reg_tree$frame[6,5],reg_tree$frame[8,5],reg_tree$frame[9,5])) #the numbers in the squared brackets are for the number of row and the number of column, respectively.


#plot the data, the linear regression line, and horizontal lines that display the average CityMPG per leaf.
ggplot(dt_cars,aes(x=HP,y=CityMPG))+
  geom_point()+
  geom_smooth(formula=y~x,method = "lm")+ #"lm" is for linear model.
  geom_hline(yintercept = City$CityMPG,color="red",alpha=0.3)+ #"alpha" is for the intensity of the color. The higher, the more intense (less transparent) the color will be. 
  labs(y="City MPG")+
  scale_y_continuous(breaks = round(sort(c(seq(min(dt_cars$CityMPG,na.rm=TRUE), max(dt_cars$CityMPG,na.rm=TRUE), by=15), City$CityMPG)))) #to display the values of the average CityMPG by using "breaks", but first I round the numbers to the nearest integer, and then sort all the possible values for CityMPG by creating a sequence from its minimum to its maximum alongside the average CityMPG per leaf, and "by" is the difference between each break.