
<!-- README.md is generated from README.Rmd. Please edit that file -->

# fixes <a><img src="man/figures/logo.png" align="right" height="138" /></a>

<!-- badges: start -->

[![R-CMD-check](https://github.com/yo5uke/fixes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/yo5uke/fixes/actions/workflows/R-CMD-check.yaml)
[![CRAN
status](https://www.r-pkg.org/badges/version/fixes)](https://CRAN.R-project.org/package=fixes)
<!-- badges: end -->

## Overview

> **Note**  
> The `fixes` package currently supports data with annual time intervals
> only.  
> For datasets with finer time intervals, such as monthly or quarterly
> data, I recommend creating a new column with sequential time numbers
> (e.g., 1, 2, 3, …) representing the time order.  
> This column can then be used for analysis.

The `fixes` package is designed for conducting analysis and creating
plots for event studies, a method used to verify the parallel trends
assumption in two-way fixed effects (TWFE) difference-in-differences
(DID) analysis.

The package includes two main functions:

1.  `run_es()`: Accepts a data frame, generates lead and lag variables,
    and performs event study analysis. The function returns the results
    as a data frame.
2.  `plot_es()`: Creates plots using `ggplot2` based on the data frame
    generated by `run_es()`. Users can choose between a plot with
    `geom_ribbon()` or `geom_errorbar()` to visualize the results.

## Installation

You can install the package like so:

``` r
# install.packages("pak")
pak::pak("fixes")
```

or

``` r
install.packages("fixes")
```

If you want to install development version, please install from GitHub
repository:

``` r
pak::pak("yo5uke/fixes")
```

## How to use

First, load the library.

``` r
library(fixes)
```

### Data frame

The data frame to be analyzed must include the following variables:

1.  A variable to identify individuals.
2.  A dummy variable indicating treated individuals (e.g.,
    `is_treated`).
3.  A variable representing time (e.g., `year`).
4.  An outcome variable.

For example, a data frame like the following:

| firm_id | state_id | year | is_treated |          y |
|--------:|---------:|-----:|-----------:|-----------:|
|       1 |       21 | 1980 |          1 |  0.8342158 |
|       1 |       21 | 1981 |          1 | -0.5354355 |
|       1 |       21 | 1982 |          1 |  1.1372828 |
|       1 |       21 | 1983 |          1 |  0.7339165 |
|       1 |       21 | 1984 |          1 |  1.4232840 |
|       1 |       21 | 1985 |          1 |  1.2783362 |

### `run_es()`

`run_es()` takes 11 arguments, including required variables and optional
specifications like covariates and clustering.

| Argument | Description |
|----|----|
| `data` | Data frame to be used. |
| `outcome` | Outcome variable. Can be specified as a raw variable or a transformation (e.g., `log(y)`). Provide it unquoted. |
| `treatment` | Dummy variable indicating the treated units. Provide it unquoted. Accepts both `0/1` and `TRUE/FALSE`. |
| `time` | Time variable. Provide it unquoted. |
| `timing` | Time value indicating when the treatment occurs. |
| `lead_range` | Number of pre-treatment periods to include (e.g., 3 = `lead3`, `lead2`, `lead1`). |
| `lag_range` | Number of post-treatment periods to include (e.g., 2 = `lag0`, `lag1`, `lag2`). |
| `covariates` | Additional covariates to include in the regression. **Must be a one-sided formula** (e.g., `~ x1 + x2`). |
| `fe` | Fixed effects to control for unobserved heterogeneity. **Must be a one-sided formula** (e.g., `~ id + year`). |
| `cluster` | Specifies clustering for standard errors. Can be a **character vector** (e.g., `c("id", "year")`) or a **formula** (e.g., `~ id + year`, `~ id^year`). |
| `baseline` | Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. **Must be within the specified lead/lag range.** |
| `interval` | Time interval between observations (e.g., `1` for yearly data, `5` for 5-year intervals). |

------------------------------------------------------------------------

#### Example: Without Covariates

``` r
event_study <- run_es(
  data       = df, 
  outcome    = y, 
  treatment  = is_treated, 
  time       = year, 
  timing     = 1998, 
  lead_range = 5, 
  lag_range  = 5, 
  fe         = ~ firm_id + year, 
  cluster    = ~ state_id, 
  baseline   = -1, 
  interval   = 1
)
```

***Note:*** The `fe` argument must be specified as a one-sided formula
(e.g., `~ firm_id + year`).  
The `cluster` argument can be specified either as a one-sided formula
(e.g., `~ state_id`) or as a character vector (e.g.,
`c("firm_id", "year")`).

The `run_es()` function returns a tidy data frame with estimated
event-study coefficients, confidence intervals, and metadata such as
relative timing and baseline identification[^1].

#### Example: With Covariates

``` r
event_study <- run_es(
  data       = df, 
  outcome    = y, 
  treatment  = is_treated, 
  time       = year, 
  timing     = 1998, 
  lead_range = 5, 
  lag_range  = 5, 
  covariates = ~ cov1 + cov2 + cov3, 
  fe         = ~ firm_id + year, 
  cluster    = ~ state_id, 
  baseline   = -1, 
  interval   = 1
)
```

You can use this result to create custom plots, or take advantage of the
built-in `plot_es()` function to visualize the estimates and confidence
intervals with minimal code.

### `plot_es()`

The `plot_es()` function creates a plot based on `ggplot2`.

`plot_es()` has 12 arguments.

| Arguments | Description |
|----|----|
| data | Data frame created by `run_es()` |
| type | The type of confidence interval visualization: “ribbon” (default) or “errorbar” |
| vline_val | The x-intercept for the vertical reference line (default: 0) |
| vline_color | Color for the vertical reference line (default: “\#000”) |
| hline_val | The y-intercept for the horizontal reference line (default: 0) |
| hline_color | Color for the horizontal reference line (default: “\#000”) |
| linewidth | The width of the lines for the plot (default: 1) |
| pointsize | The size of the points for the estimates (default: 2) |
| alpha | The transparency level for ribbons (default: 0.2) |
| barwidth | The width of the error bars (default: 0.2) |
| color | The color for the lines and points (default: “\#B25D91FF”) |
| fill | The fill color for ribbons (default: “\#B25D91FF”). |

If you don’t care about the details, you can just pass the data frame
created with `run_es()` and the plot will be complete.

``` r
plot_es(event_study)
```

![](README_files/figure-gfm/unnamed-chunk-5-1.png)<!-- -->

``` r
plot_es(event_study, type = "errorbar")
```

![](README_files/figure-gfm/unnamed-chunk-6-1.png)<!-- -->

``` r
plot_es(event_study, type = "errorbar", vline_val = -.5)
```

![](README_files/figure-gfm/unnamed-chunk-7-1.png)<!-- -->

Since it is created on a `ggplot2` basis, it is possible to modify minor
details.

``` r
plot_es(event_study, type = "errorbar") + 
  ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) + 
  ggplot2::ggtitle("Result of Event Study")
```

![](README_files/figure-gfm/unnamed-chunk-8-1.png)<!-- -->

## Planned Features

- Support for custom confidence level in `plot_es()` (e.g.,
  `conf_level = 0.90`)
- Support for faceted plots by subgroup (e.g., `facet_by = "group"`)

## Debugging

If you find an issue, please report it on the GitHub Issues page.

[^1]: Behind the scenes, estimation is performed using
    `fixest::feols()`.
