Two-way Fixed Effects

People have a relatively constant baseline level of “happiness” that is insensitive to significant life events in the long run. Brickman and Campbell proposed this hypothesis, called hedonic treadmill, in 1971 (Brickman and Campbell, 1971).

I test the hedonic treadmill hypothesis by considering childbirth as a major life event and how the adaptation of having a child in the years lagging the event, as well as the anticipation of having a child in the years leading to the event, correlates with subjective well-being.


\(\bullet\) Explanation:

I examine the effect of the birth of a child, as a major life event, on subjective well-being for the years leading to the childbirth (e.g., before the event) and the years in which they lag the event (e.g., after the event), representing anticipation and adaptation to childbirth, respectively.

Following Clark et al. (2008), the variable for having a child is when the number of children in a given year is greater than the number of children in the previous year. Then, for the years after childbirth, I divide individuals into six groups:

  1. Individuals who have a child for one year or less; (2) Individuals who have a child for 1-2 years; (3) Individuals who have a child for 2-3 years; (4) Individuals who have a child for 3-4 years; (5) Individuals who have a child for 4-5 years; (6) Individuals who have a child for five years or more.

Similarly, I did the same grouping for the years before childbirth (e.g., anticipation period). The reason for this grouping is to differentiate the effect of childbirth on subjective well-being over different periods and how the level of anticipation and adaptation vary over time.

We do not know at which point in time individuals, on average, anticipate or plan to have a child. Individuals who have a child now do not necessarily have the same preference as five years ago. Hence, it is hard to assume that all individuals who have a child now actually anticipated one five years ago. However, the child affects the parents’ lives for years to come, so the effect afterward might last longer.

Group number 6, individuals who have a child for five years or longer, is the appropriate group to test the hedonic treadmill hypothesis. That is, to test how sensitive the level of subjective well-being is to childbirth in the long run.

Empirically, I use a two-way fixed effects model and regress on dummies for each group and other explanatory variables, namely employment status, household income, education, health, marital status, and age. I further split age into five groups: (1) between 16-20 years; (2) 21-30 years; (3) 31-40 years; (4) 41-50 years; (5) 51-60 years.

Equation (1) shows the empirical specification for estimating the adaptation period: \[\begin{equation} LS_{it}=\alpha'{\bf X_{it}}+ \sum_{n=0}^5 \beta_n {\bf C_{n,it}}+\theta_i+\lambda_t+\epsilon_{it}, \end{equation}\] where \(LS_{it}\) is overall life satisfaction of person \(i\) at time \(t\); \({\bf X_{it}}\) is a vector of variables: number of years of education, labor force status, marital status, self-rated health status, and household income adjusted by household size; \(C_{n,it}\) is a vector of dummies if a person \(i\) has a birth of child at time \(t\) for groups \(n\): if they have a child for 0-1 years, 1-2 years, until 5 or more years; \(\theta_i\) is a set of dummies to control for the time-invariant specific characteristics of a person \(i\), such as genetics and stable personal traits; \(\lambda_t\) is a set of year dummies to control for the time-variant shocks that happen at year \(t\), such as the German reunification, the financial crisis, and the COVID-19 pandemic; and \(\epsilon_{it}\) is the standard error, clustered at the individual level.

The coefficient, \(\beta_5\), corresponds to the last group, \(C_{5,it}\), captures the effect of having a child for five or more years on life satisfaction, which tests the significance of the hedonic treadmill hypothesis.

Similarly, equation (2) shows the empirical specification for estimating the anticipation period: \[\begin{equation} LS_{it}=\alpha '{\bf X_{it}}+ \sum_{n=-5}^0 \beta_n {\bf C_{n,i,-t}}+\theta_i+\lambda_t+\epsilon_{it}, \end{equation}\]

where all the variables are the same, except the notation for the six groups, which are indicated in negative, for the years before childbirth.


\(\bullet\) Data:

I use the teaching version of the German Socio-Economic Panel (SOEP) dataset. The SOEP is a representative panel dataset managed by the German Institute for Economic Research (DIW Berlin), in which individuals in households throughout Germany are surveyed yearly.

The data covers the years from 1984, the first year the survey started, until 2020. The teaching version contains 50% of the original SOEP data.

The variable of interest in the dataset is “overall life satisfaction.” The variable represents all household members who are 16 years of age or older’s general happiness with life at the time of the survey (Grabka, 2020). For consistency, I focus on individuals at least 16 years old, which yields, in total, 59,268 individuals, with 407,502 observations.

Respondents are asked to rate their general level of life satisfaction on a scale of 0 to 10, where 0 denotes complete dissatisfaction and 10 denotes complete satisfaction (Grabka, 2020).

Due to data protection agreements, I am not able to share the dataset. However, the data is available upon request from the German Institute for Economic Research (DIW Berlin).

#load the necessary packages
library(haven)
library(dplyr)
library(ggplot2)
library(fixest)

#import the dataset.
data<-read_dta("/soep-teaching-v37/pequiv.dta")

#arrange the dataset by personal id then by year.
data<-data %>%
  arrange(pid,syear)

#include only individuals who are at least 16 years old.
data<-subset(data,d11101>=16)

#count unique IDs
length(unique(data$pid))

#create a dummy if the number of children at a given year is greater than the number of 
#children in the previous year 
data<-data %>%
  group_by(pid) %>% #sort by ID.
  mutate(child_dummy=ifelse(d11107>dplyr::lag(d11107),1,0)) #create the dummy
#according to the specified condition.

#create dummies for the six group of persons who had a child birth
data <- data %>%
  group_by(pid) %>%
  mutate(ch0=ifelse(child_dummy==1|dplyr::lag(child_dummy)==1 ,1,0))

#variables for the next four groups of people: 
#who had a child birth for 1-2 years, 2-3 years, 3-4 years, and 4-5 years.
data<- data%>%
  group_by(pid) %>%
  mutate(ch1=ifelse(dplyr::lag(child_dummy,1)==1|dplyr::lag(child_dummy,2)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(ch2=ifelse(dplyr::lag(child_dummy,2)==1|dplyr::lag(child_dummy,3)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(ch3=ifelse(dplyr::lag(child_dummy,3)==1|dplyr::lag(child_dummy,4)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(ch4=ifelse(dplyr::lag(child_dummy,4)==1|dplyr::lag(child_dummy,5)==1,1,0))

#Alternatively, for these four variables, a "for" loop can be used:
data <- data %>%
  group_by(pid)
for (i in 1:4) {
  data <- data %>%
    mutate(!!paste0("ch", i) := ifelse(dplyr::lag(child_dummy, i) == 1 |
                    dplyr::lag(child_dummy, i + 1) == 1, 1, 0))
}

#dummy for the last group who have a child for 5 years or more.
data<- data%>%
  group_by(pid) %>%
  mutate(ch_long=ifelse(dplyr::lag(child_dummy,5)==1|dplyr::lag(child_dummy,6)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(ch_long=replace(ch_long,any(ch_long==1),1))

data$ch_long[(data$ch0==1|data$ch1==1|data$ch2==1|data$ch3==1|data$ch4==1)]<-0
data$ch_long[is.na(data$ch0)|is.na(data$ch1)|is.na(data$ch2)|is.na(data$ch3)|
               is.na(data$ch4)]<-NA

#similarly for leads
data <- data %>%
  group_by(pid) %>%
  mutate(chm0=ifelse(child_dummy==1|dplyr::lead(child_dummy)==1 ,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(chm1=ifelse(dplyr::lead(child_dummy,1)==1|dplyr::lead(child_dummy,2)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(chm2=ifelse(dplyr::lead(child_dummy,2)==1|dplyr::lead(child_dummy,3)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(chm3=ifelse(dplyr::lead(child_dummy,3)==1|dplyr::lead(child_dummy,4)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(chm4=ifelse(dplyr::lead(child_dummy,4)==1|dplyr::lead(child_dummy,5)==1,1,0))

#Alternatively, a "for" loop is shorter
data <- data %>% 
  group_by(pid)
for (i in 0:4) {
  data <- data %>%
    mutate(!!paste0("chm", i) := ifelse(dplyr::lead(child_dummy, i) == 1 | 
                                        dplyr::lead(child_dummy, i + 1) == 1, 1, 0))
}

#last group
data<-data%>%
  group_by(pid) %>%
  mutate(chm_long=ifelse(dplyr::lead(child_dummy,5)==1|dplyr::lead(child_dummy,6)==1,1,0))

data<- data%>%
  group_by(pid) %>%
  mutate(chm_long=replace(chm_long,any(chm_long==1),1))

data$chm_long[(data$chm0==1|data$chm1==1|data$chm2==1|data$chm3==1|data$chm4==1)]<-0
data$chm_long[is.na(data$chm0)|is.na(data$chm1)|is.na(data$chm2)|is.na(data$chm3)|
  is.na(data$chm4)]<-NA

#substitute the negative values of the variables by NA:
#life satisfaction
data$p11101 <- replace(data$p11101, which(data$p11101 < 0), NA)      

#gender
data$d11102ll <- replace(data$d11102ll, which(data$d11102ll < 0), NA)      

#marital status
data$d11104 <- replace(data$d11104, which(data$d11104 < 0), NA)      

#education
data$d11109 <- replace(data$d11109, which(data$d11109 < 0), NA)      

#employment
data$e11102 <- replace(data$e11102, which(data$e11102 < 0), NA)      

#health status
data$m11126 <- replace(data$m11126, which(data$m11126 < 0), NA)      
data[,"m11126"]=6-data[,"m11126"] #since the health status is reversed (e.g. 1 is the best), 
#I reverse its order to be consistent with other variables.

#household pre-Government income
data$i11101 <- replace(data$i11101, which(data$i11101 < 0), NA) 

#adjust household income by household size.
data$i11101<-data$i11101/data$d11106

#age
data$d11101 <- replace(data$d11101, which(data$d11101 < 0), NA)

#create age groups.
data$age_1<-ifelse(data$d11101>=16&data$d11101<=20,1,0)
data$age_2<-ifelse(data$d11101>=21&data$d11101<=30,1,0)
data$age_3<-ifelse(data$d11101>=31&data$d11101<=40,1,0)
data$age_4<-ifelse(data$d11101>=41&data$d11101<=50,1,0)
data$age_5<-ifelse(data$d11101>=51&data$d11101<=60,1,0)

However, ifelse is repetitive, so we can use cut function: ::: {.cell}

age_breaks <- c(15, 20, 30, 40, 50, 60, Inf)
age_labels <- c("age_1", "age_2", "age_3", "age_4", "age_5", "age_6")

data$age_group <- cut(data$d11101, breaks = age_breaks, labels = age_labels, right = FALSE)

#to create dummy variables
age_dummies <- model.matrix(~ age_group - 1, data = data)
colnames(age_dummies) <- levels(data$age_group)
data <- cbind(data, age_dummies)

#store a copy of the data as a data frame.
data_fr<-data.frame(data)

#transform the gender variable as a character to be used later for grouping. 
data_fr$d11102ll<-as.character(data_fr$d11102ll)
data_fr[,"d11102ll"=="1"]<-"Male"
data_fr[,"d11102ll"=="2"]<-"Female"

:::

Table (1) and figure (1) show the distribution of life satisfaction by gender.

#life satisfaction by gender.
table(data$d11102ll,data$p11101)

#in percentage
prop.table(table(data$d11102ll,data$p11101))*100

#rendered in html.

#plot life satisfaction by gender.
barplot(table(data$d11102ll,data$p11101)/sum(table(data$p11101))*100,main=
          "Distribution of life satisfaction",ylab="Percentage", beside=T,
        col=c("black","white"),
        sub = "Figure 1: Distribution of life satisfaction, in percentage, by gender.")
legend("top", c("Male","Female"),fill = c("black","white"))
The distribution of life satisfaction by gender.
Males Females
Life satisfaction count % count %
0 769 0.211 802 0.220
1 691 0.190 776 0.213
2 1,908 0.524 2,191 0.602
3 4,120 1.132 4,528 1.244
4 5,336 1.466 6,074 1.669
5 18,157 4.989 21,722 5.968
6 17,633 4.845 18,831 5.174
7 37,439 10.286 37,891 10.411
8 54,904 15.085 57,650 15.839
9 22,564 6.199 26,057 7.159
10 10,884 2.990 13,042 3.583
Total 174,405 47.917 189,564 52.082

Distribution of life satisfaction, in percentage, by gender

The percentage column in table (1) shows the percentage of the grand total. The number of female respondents, at each level of reported life satisfaction, is always greater than the number of male respondents, with the highest difference at the life satisfaction median level of 5, and at the highest level of 10.

Most individuals report overall life satisfaction of 8 out of 10. The lowest proportion among the distribution of life satisfaction is males who report a life satisfaction of 1, which represents around 0.2%. Figure (1) shows that the distribution of overall life satisfaction is skewed towards the end of the scale. That is, most individuals report a higher score of overall life satisfaction.


Number of observations for each group:

#no. of observations for each group
table(data$ch0,data$ch1,data$ch2,data$ch3,data$ch4,data$ch_long)

table(data$chm0,data$chm1,data$chm2,data$chm3,data$chm4,data$chm_long)

#unique IDs for last group. Can be generalized for all groups.
datatest<-filter(data,chm_long==1)
length(unique(datatest$pid))
No. of unique individuals & no. of observations for each group; adaptation period.
Group No. of unique individuals No. of observations
0-1 years 7,527 19,289
1-2 years 6,505 16,836
2-3 years 5,641 14,706
3-4 years 4,842 12,896
4-5 years 4,242 11,483
5 years or longer 3,103 28,808
No. of unique individuals & no. of observations for each group; anticipation period.
Group No. of unique individuals No. of observations
0-1 years 7,527 20,662
1-2 years 7,527 18,559
2-3 years 5,965 15,119
3-4 years 4,930 12,575
4-5 years 4,123 10,470
5 years or longer 2,462 21,818

\(\bullet\) Estimation:

#running the model.
model1<-feols(p11101~ch0+ch1+ch2+ch3+ch4+ch_long+d11104+d11109+e11102+m11126+
                log(i11101)+age_1+age_2+age_3+age_4+age_5|pid+syear,
              data=subset(data,d11102ll==1))
model2<-feols(p11101~ch0+ch1+ch2+ch3+ch4+ch_long+d11104+d11109+e11102+m11126+
                log(i11101)+age_1+age_2+age_3+age_4+age_5|pid+syear,
              data=subset(data,d11102ll==2))

model3<-feols(p11101~chm0+chm1+chm2+chm3+chm4+chm_long+d11104+d11109+e11102+m11126+
                log(i11101)+age_1+age_2+age_3+age_4+age_5|pid+syear,
              data=subset(data,d11102ll==1))
model4<-feols(p11101~chm0+chm1+chm2+chm3+chm4+chm_long+d11104+d11109+e11102+m11126+
                log(i11101)+age_1+age_2+age_3+age_4+age_5|pid+syear,
              data=subset(data,d11102ll==2))

#Alternatively, I create a function to run a model based on given parameters,
#then loop through the different conditions to create each model
create_model <- function(data, condition, ch_vars) {
  formula <- as.formula(paste("p11101 ~", paste(ch_vars, collapse = "+"), 
                              "+ d11104 + d11109 + e11102 + m11126 + log(i11101) +",
                              "age_1 + age_2 + age_3 + age_4 + age_5 | pid + syear"))
  
  feols(formula, data = subset(data, d11102ll == condition))
}

ch_vars1 <- c("ch0", "ch1", "ch2", "ch3", "ch4", "ch_long")
ch_vars2 <- c("chm0", "chm1", "chm2", "chm3", "chm4", "chm_long")

models <- list()
models[[1]] <- create_model(data, 1, ch_vars1)
models[[2]] <- create_model(data, 2, ch_vars1)
models[[3]] <- create_model(data, 1, ch_vars2)
models[[4]] <- create_model(data, 2, ch_vars2)

etable(model1,model2,model3,model4,digits=3,dict=c("p11101"=="Life Satisfaction", 
                                                   ch0="Child birth, 0-1 years", 
                                                   ch1="Child birth, 1-2 years",
                                                   ch2="Child birth, 2-3 years",
                                                   ch3="Child birth, 3-4 years",
                                                   ch4="Child birth 4-5 years",
                                                   ch_long="Child birth, 5 years or more",
                                                   d11104="Marital status",
                                                   d11109="Education",
                                                   e11102="Employment status",
                                                   m11126="Health status",
                                                   'log(i11101)'="Log of household income",
                                                   child_dummy="Having a child birth",
                                                   age_1="age 16-20",
                                                   age_2="age 21-30",
                                                   age_3="age 31-40",
                                                   age_4="age 41-50",
                                                   age_5="age 51-60",
                                                   ch0="Child birth, 0-1 years",
                                                   ch1="Child birth, 1-2 years",
                                                   ch2="Child birth, 2-3 years",
                                                   ch3="Child birth, 3-4 years",
                                                   ch4="Child birth 4-5 years",
                                                   ch_long="Child birth, 5 years or more",
                                                   chm0="Expecting child 0-1 years",
                                                   chm1="Anticipating child 1-2 years",
                                                   chm2="Anticipating child 2-3 years",
                                                   chm3="Anticipating child 3-4 years",
                                                   chm4="Anticipating child 4-5 years",
                                      chm_long="Anticipating child, 5 years or more")
       ,depvar = F,headers="Life Satisfaction")
Results of two-way fixed effects model
Table 4: Results of two-way fixed effects model: adaptation and anticipation of a childbirth on overall life satisfaction.

The effect of having a child for five years or more on life satisfaction is -0.095 points for males and -0.066 points for females. However, the coefficient is significant only for males, at the 5% level. These two coefficients test the hedonic treadmill hypothesis. That is, the overall life satisfaction of an individual who has a child for five years or more declines by 0.095 points, on a scale of 10, if male and declines by 0.066 points if female. The results imply that we can not reject the hedonic treadmill hypothesis for females, but we reject the hedonic treadmill hypothesis for males at the 5% level.

Results for adaptation: Having a child affects overall life satisfaction in the early years. However, adaptation to child- birth starts to differ around the third year between males and females. After the third year, females completely adapt to childbirth showing no significant changes in overall life satisfaction.

On the other hand, the trend for males is different: after the second year, the changes in life satisfaction are not significant, except if a male has a child for 3-4 years or has a child for five years or more.

The life satisfaction of an individual with a child for one year or less, regardless of the individual’s gender, increases the most relative to the other groups. Then, after having a child for 1-2 years, life satisfaction declines for both gender, and the coefficients are significant at the 5% level.

Life satisfaction declines from the third year onward for both genders, but the decline is only significant, at the 10% level, for males who have a child for 3-4 years and for males who have a child for five years or longer, at the 5% level.

Results for anticipation: The life satisfaction of individuals who expect a child in one year or less increases the most relative to the other groups, and the change is greater for females than males.

The life satisfaction of a male, who expects a child for longer than a year, does not change significantly. However, the effect is different for females: if an individual anticipates having a child in 3-4 years, overall life satisfaction declines significantly by 0.068 points. However, overall life satisfaction increases significantly if the anticipation is in 2-3 years, by 0.147 points. Then, a decline occurs again if the anticipation is in 1-2 years, by 0.064 points.


The fixed effects model controls for the specific characteristics of each individual that do not change over time and the shocks that change over time and affect all individuals. In addition, the model controls for education, health, marital status, employment status, household income, and age.

The question arises: what might explain the difference in the effect of childbirth on life satisfaction between males and females?

One possible answer is if the individual is a single parent compared to the case when two parents raise the child. When an individual is a single parent, the effect of having a child on the individual’s life satisfaction might differ from when an individual has a partner who shares with them the childcare responsibility.

Another possible answer is the household dynamics for childcare. If the male allocates more time to childcare in a household than the female, the life satisfaction of the male might change more than the life satisfaction of the female. The opposite case holds. However, this might not hold in the case of homosexual parents.

The difference in allocation of time to childcare between genders might explain the difference in life satisfaction between males and females.


References:

\(\bullet\) Brickman, P., & Campbell, D. T. (1971). Hedonic relativism and planning the good society. Adaptation level theory, 287–301.

\(\bullet\) Clark, A. E., Diener, E., Georgellis, Y., & Lucas, R. E. (2008). Lags and leads in life satisfaction: A test of the baseline hypothesis. The Economic Journal, 118(529), F222–F243.

\(\bullet\) Grabka, M. M. (2020). Soep-core v35-codebook for the pequiv file 1984-2018: Cnef variables with extended income information for the soep (tech. rep.). SOEP Survey Papers.