% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stan_foot.R
\name{stan_foot}
\alias{stan_foot}
\title{Fit football models using CmdStan}
\usage{
stan_foot(
  data,
  model,
  predict = 0,
  ranking,
  dynamic_type,
  prior_par = list(ability = normal(0, NULL), ability_sd = cauchy(0, 5), home = normal(0,
    5)),
  home_effect = TRUE,
  norm_method = "none",
  ranking_map = NULL,
  method = "MCMC",
  ...
)
}
\arguments{
\item{data}{A data frame containing match data with columns:
\itemize{
  \item \code{periods}:  Time point of each observation (integer >= 1).
  \item \code{home_team}: Home team's name (character string).
  \item \code{away_team}: Away team's name (character string).
  \item \code{home_goals}: Goals scored by the home team (integer >= 0).
  \item \code{away_goals}: Goals scored by the away team (integer >= 0).
}}

\item{model}{A character string specifying the Stan model to fit. Options are:
\itemize{
  \item \code{"double_pois"}: Double Poisson model.
  \item \code{"biv_pois"}: Bivariate Poisson model.
  \item \code{"neg_bin"}: Negative Binomial model.
  \item \code{"skellam"}: Skellam model.
  \item \code{"student_t"}: Student's t model.
  \item \code{"diag_infl_biv_pois"}: Diagonal-inflated bivariate Poisson model.
  \item \code{"zero_infl_skellam"}: Zero-inflated Skellam model.
}}

\item{predict}{An integer specifying the number of out-of-sample matches for prediction. If missing, the function fits the model to the entire dataset without making predictions.}

\item{ranking}{An optional \code{"btdFoot"} class element or a data frame containing ranking points for teams with the following columns:
\itemize{
  \item \code{periods}: Time periods corresponding to the rankings (integer >= 1).
  \item \code{team}: Team names matching those in \code{data} (character string).
  \item \code{rank_points}: Ranking points for each team (numeric).
}}

\item{dynamic_type}{A character string specifying the type of dynamics in the model. Options are:
\itemize{
  \item \code{"weekly"}: Weekly dynamic parameters.
  \item \code{"seasonal"}: Seasonal dynamic parameters.
}}

\item{prior_par}{A list specifying the prior distributions for the parameters of interest:
  \itemize{
    \item \code{ability}: Prior distribution for team-specific abilities. Possible distributions are \code{normal}, \code{student_t}, \code{cauchy}, \code{laplace}. Default is \code{normal(0, NULL)}.
    \item \code{ability_sd}:  Prior distribution for the team-specific standard deviations. See the \code{prior} argument for more details. Default is \code{cauchy(0, 5)}.
    \item \code{home}: Prior distribution for the home effect (\code{home}). Applicable only if \code{home_effect = TRUE}. Only normal priors are allowed. Default is \code{normal(0, 5)}.
  }

  See the \pkg{rstanarm} package for more details on specifying priors.}

\item{home_effect}{A logical value indicating the inclusion of a home effect in the model. (default is \code{TRUE}).}

\item{norm_method}{A character string specifying the method used to normalize team-specific ranking points. Options are:
\itemize{
  \item \code{"none"}: No normalization (default).
  \item \code{"standard"}: Standardization (mean 0, standard deviation 1).
  \item \code{"mad"}: Median Absolute Deviation normalization.
  \item \code{"min_max"}: Min-max scaling to [0,1].
}}

\item{ranking_map}{An optional vector mapping ranking periods to data periods. If not provided and the number of ranking periods matches the number of data periods, a direct mapping is assumed.}

\item{method}{A character string specifying the method used to obtain the Bayesian estimates. Options are:
\itemize{
  \item \code{"MCMC"}: Markov chain Monte Carlo algorithm (default).
  \item \code{"VI"}: Automatic differentiation variational inference algorithms.
  \item \code{"pathfinder"}: Pathfinder variational inference algorithm.
  \item \code{"laplace"}: Laplace algorithm.
}}

\item{...}{Additional arguments passed to \code{\link[cmdstanr]{cmdstanr}} (e.g., \code{iter_sampling}, \code{chains}, \code{parallel_chains}).}
}
\value{
An object of class \code{"stanFoot"}, which is a list containing:
  \itemize{
    \item \code{fit}: The \code{CmdStanFit} object returned by \code{\link[cmdstanr]{cmdstanr}}.
    \item \code{data}: The input data.
    \item \code{stan_data}: The data list passed to Stan.
    \item \code{stan_code}: The Stan code of the underline model.
    \item \code{stan_args}: The optional \code{\link[cmdstanr]{cmdstanr}} parameters passed to (\code{...}).
    \item \code{alg_method}: The inference algorithm used to obtain the Bayesian estimates.
  }
}
\description{
Fits football goal-based models using Stan via the CmdStan backend.
Supported models include: double Poisson, bivariate Poisson, Skellam, Student's t, diagonal-inflated bivariate Poisson, zero-inflated Skellam, and negative Binomial.
}
\details{
Let \eqn{(y^{H}_{n}, y^{A}_{n})} denote the
observed number of goals scored by the home
and the away team in the \eqn{n}-th game,
respectively. A general bivariate Poisson model
allowing for goals' correlation
(Karlis & Ntzoufras, 2003) is the following:

\deqn{ Y^H_n, Y^A_n| \lambda_{1n}, \lambda_{2n}, \lambda_{3n}  \sim \mathsf{BivPoisson}(\lambda_{1n}, \lambda_{2n}, \lambda_{3n})}
\deqn{\log(\lambda_{1n})  = \mu+att_{h_n} + def_{a_n}}
\deqn{\log(\lambda_{2n})  = att_{a_n} + def_{h_n}}
\deqn{\log(\lambda_{3n})  =\beta_0,}

where the case \eqn{\lambda_{3n}=0} reduces to
the double Poisson model (Baio & Blangiardo, 2010).
 \eqn{\lambda_{1n}, \lambda_{2n}} represent the
 scoring rates for the home and the away team,
 respectively, where: \eqn{\mu} is the home effect;
 the parameters \eqn{att_T} and
  \eqn{def_T} represent the attack and the
  defence abilities,
respectively, for each team \eqn{T}, \eqn{T=1,\ldots,N_T};
the nested indexes \eqn{h_{n}, a_{n}=1,\ldots,N_T}
denote the home and the away team playing in the \eqn{n}-th game,
respectively. Attack/defence parameters are imposed a
sum-to-zero constraint to achieve identifiability and
assigned some weakly-informative prior distributions:

\deqn{att_T \sim \mathrm{N}(\mu_{att}, \sigma_{att})}
\deqn{def_T \sim \mathrm{N}(\mu_{def}, \sigma_{def}),}

with hyperparameters \eqn{\mu_{att}, \sigma_{att}, \mu_{def}, \sigma_{def}}.

Instead of using the marginal number of goals,
another alternative is to modelling directly
the score difference \eqn{(y^{H}_{n}- y^{A}_{n})}.
We can use the Poisson-difference distribution
(or Skellam distribution) to model goal
difference in the \eqn{n}-th match (Karlis & Ntzoufras, 2009):

\deqn{y^{H}_{n}- y^{A}_{n}| \lambda_{1n}, \lambda_{2n} \sim PD(\lambda_{1n}, \lambda_{2n}),}

and the scoring rates \eqn{\lambda_{1n}, \lambda_{2n}} are
unchanged with respect to the bivariate/double Poisson model.
If we want to use a continue distribution, we can
use a student t distribution with 7 degrees of
freedom (Gelman, 2014):

\deqn{y^{H}_{n}- y^{A}_{n} \sim t(7, ab_{h_{n}}-ab_{a(n)}, \sigma_y)}
\deqn{ab_t \sim \mathrm{N}(\mu + b \times {prior\_score}_t, sigma_{ab}),}

where \eqn{ab_t} is the overall ability for
the \eqn{t}-th team, whereas \eqn{prior\_score_t}
is a prior measure of team's strength (for instance a
ranking).

These model rely on the assumption of static parameters.
However, we could assume dynamics in the attach/defence
abilities (Owen, 2011; Egidi et al., 2018, Macrì Demartino et al., 2024) in terms of weeks or seasons through the argument
\code{dynamic_type}. In such a framework, for a given
number of times \eqn{1, \ldots, \mathcal{T}}, the models
above would be unchanged, but the priors for the abilities
parameters at each time \eqn{\tau, \tau=2,\ldots, \mathcal{T},} would be:

\deqn{att_{T, \tau} \sim \mathrm{N}({att}_{T, \tau-1}, \sigma_{att})}
\deqn{def_{T, \tau} \sim \mathrm{N}({def}_{T, \tau-1}, \sigma_{def}),}

whereas for \eqn{\tau=1} we have:

\deqn{att_{T, 1} \sim \mathrm{N}(\mu_{att}, \sigma_{att})}
\deqn{def_{T, 1} \sim \mathrm{N}(\mu_{def}, \sigma_{def}).}

Of course, the identifiability constraint must be imposed for
each time \eqn{\tau}.

The current version of the package allows for the fit of a
diagonal-inflated bivariate Poisson and a zero-inflated Skellam model in the
spirit of (Karlis & Ntzoufras, 2003) to better capture draw occurrences. See the vignette for further details.
}
\examples{
\dontrun{
if (instantiate::stan_cmdstan_exists()) {
  library(dplyr)

  # Example usage with ranking
  data("italy")
  italy <- as_tibble(italy)
  italy_2021 <- italy \%>\%
    select(Season, home, visitor, hgoal, vgoal) \%>\%
    filter(Season == "2021")


  teams <- unique(italy_2021$home)
  n_rows <- 20

  # Create fake ranking
  ranking <- data.frame(
    periods = rep(1, n_rows),
    team = sample(teams, n_rows, replace = FALSE),
    rank_points = sample(0:60, n_rows, replace = FALSE)
  )

  ranking <- ranking \%>\%
    arrange(periods, desc(rank_points))


  colnames(italy_2021) <- c("periods", "home_team", "away_team", "home_goals", "away_goals")

  fit_with_ranking <- stan_foot(
    data = italy_2021,
    model = "diag_infl_biv_pois",
    ranking = ranking,
    home_effect = TRUE,
    prior_par = list(
      ability = student_t(4, 0, NULL),
      ability_sd = cauchy(0, 3),
      home = normal(1, 10)
    ),
    norm_method = "mad",
    iter_sampling = 1000,
    chains = 2,
    parallel_chains = 2,
    adapt_delta = 0.95,
    max_treedepth = 15
  )

  # Print a summary of the model fit
  print(fit_with_ranking, pars = c("att", "def"))



  ### Use Italian Serie A from 2000 to 2002

  data("italy")
  italy <- as_tibble(italy)
  italy_2000_2002 <- italy \%>\%
    dplyr::select(Season, home, visitor, hgoal, vgoal) \%>\%
    dplyr::filter(Season == "2000" | Season == "2001" | Season == "2002")

  colnames(italy_2000_2002) <- c("periods", "home_team", "away_team", "home_goals", "away_goals")

  ### Fit Stan models
  ## no dynamics, no predictions

  fit_1 <- stan_foot(
    data = italy_2000_2002,
    model = "double_pois"
  ) # double poisson
  print(fit_1, pars = c(
    "home", "sigma_att",
    "sigma_def"
  ))

  fit_2 <- stan_foot(
    data = italy_2000_2002,
    model = "biv_pois"
  ) # bivariate poisson
  print(fit_2, pars = c(
    "home", "rho",
    "sigma_att", "sigma_def"
  ))

  fit_3 <- stan_foot(
    data = italy_2000_2002,
    mode = "skellam"
  ) # skellam
  print(fit_3, pars = c(
    "home", "sigma_att",
    "sigma_def"
  ))

  fit_4 <- stan_foot(
    data = italy_2000_2002,
    model = "student_t"
  ) # student_t
  print(fit_4, pars = c("beta"))

  ## seasonal dynamics, no prediction

  fit_5 <- stan_foot(
    data = italy_2000_2002,
    model = "double_pois",
    dynamic_type = "seasonal"
  ) # double poisson
  print(fit_5, pars = c(
    "home", "sigma_att",
    "sigma_def"
  ))

  ## seasonal dynamics, prediction for the last season

  fit_6 <- stan_foot(
    data = italy_2000_2002,
    model = "double_pois",
    dynamic_type = "seasonal",
    predict = 170
  ) # double poisson
  print(fit_6, pars = c(
    "home", "sigma_att",
    "sigma_def"
  ))

  ## other priors' options
  # double poisson with
  # student_t priors for teams abilities
  # and laplace prior for the hyper sds

  fit_p <- stan_foot(
    data = italy_2000_2002,
    model = "double_pois",
    prior_par = list(
      ability = student_t(4, 0, NULL),
      ability_sd = laplace(0, 1),
      home = normal(1, 10)
    )
  )

  print(fit_p, pars = c(
    "home", "sigma_att",
    "sigma_def"
  ))
}
}
}
\references{
Baio, G. and Blangiardo, M. (2010). Bayesian hierarchical model for the prediction of football
results. Journal of Applied Statistics 37(2), 253-264.

Egidi, L., Pauli, F., and Torelli, N. (2018). Combining historical data
and bookmakers' odds in modelling football scores. Statistical Modelling, 18(5-6), 436-459.

Gelman, A. (2014). Stan goes to the World Cup. From
"Statistical Modeling, Causal Inference, and Social Science" blog.

Macrì Demartino, R., Egidi, L. and Torelli, N. Alternative ranking measures to predict
international football results. Computational Statistics (2024), 1-19.

Karlis, D. and Ntzoufras, I. (2003). Analysis of sports data by using bivariate poisson models.
Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 381-393.

Karlis, D. and Ntzoufras,I. (2009).  Bayesian modelling of football outcomes: Using
the Skellam's distribution for the goal difference. IMA Journal of Management Mathematics 20(2), 133-145.

Owen, A. (2011). Dynamic Bayesian forecasting models
of football match outcomes with estimation of the
evolution variance parameter. IMA Journal of Management Mathematics, 22(2), 99-113.
}
\author{
Leonardo Egidi \email{legidi@units.it}, Roberto Macrì Demartino \email{roberto.macridemartino@deams.units.it}, and Vasilis Palaskas \email{vasilis.palaskas94@gmail.com}.
}
