% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/read-bgen.R
\name{snp_readBGEN}
\alias{snp_readBGEN}
\title{Read BGEN files into a "bigSNP"}
\usage{
snp_readBGEN(
  bgenfiles,
  backingfile,
  list_snp_id,
  ind_row = NULL,
  bgi_dir = dirname(bgenfiles),
  read_as = c("dosage", "random"),
  ncores = 1
)
}
\arguments{
\item{bgenfiles}{Character vector of paths to files with extension ".bgen".
The corresponding ".bgen.bgi" index files must exist.}

\item{backingfile}{The path (without extension) for the backing files (".bk"
and ".rds") that are created by this function for storing the
\link[=bigSNP-class]{bigSNP} object.}

\item{list_snp_id}{List (same length as the number of BGEN files) of
character vector of SNP IDs to read. These should be in the form
\code{"<chr>_<pos>_<a1>_<a2>"} (e.g. \code{"1_88169_C_T"} or \code{"01_88169_C_T"}).
If you have one BGEN file only, just wrap your vector of IDs with \code{list()}.
\strong{This function assumes that these IDs are uniquely identifying variants.}}

\item{ind_row}{An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. \strong{Don't use negative indices.}
You can access the sample IDs corresponding to the genotypes from the \emph{.sample}
file, and use e.g. \code{match()} to get indices corresponding to the ones you want.}

\item{bgi_dir}{Directory of index files. Default is the same as \code{bgenfiles}.}

\item{read_as}{How to read BGEN probabilities? Currently implemented:
\itemize{
\item as dosages (rounded to two decimal places), the default,
\item as hard calls, randomly sampled based on those probabilities
(similar to PLINK option '\verb{--hard-call-threshold random}').
}}

\item{ncores}{Number of cores used. Default doesn't use parallelism.
You may use \code{\link[=nb_cores]{nb_cores()}}.}
}
\value{
The path to the RDS file \verb{<backingfile>.rds} that stores the \code{bigSNP}
object created by this function. Note that this function creates another
file (\emph{.bk}) which stores the values of the FBM (\verb{$genotypes}). The \verb{$map}
component of the \code{bigSNP} object stores some information on the variants
(including allele frequencies and INFO scores computed from the probabilities).
However, it does not have a \verb{$fam} component; you should use the individual
IDs in the \emph{.sample} file (filtered with \code{ind_row}) to add external information
on the individuals.\cr
\strong{You shouldn't read from BGEN files more than once.} Instead, use
\link{snp_attach} to load the "bigSNP" object in any R session from backing files.
}
\description{
Function to read the UK Biobank BGEN files into a \link[=bigSNP-class]{bigSNP}.
}
\details{
For more information on this format, please visit
\href{https://bitbucket.org/gavinband/bgen/}{BGEN webpage}.

This function is designed to read UK Biobank imputation files. This assumes
that variants have been compressed with zlib, that there are only 2 possible
alleles, and that each probability is stored on 8 bits. For example, if you
use \emph{qctool} to generate your own BGEN files, please make sure you are using
options '\verb{-ofiletype bgen_v1.2 -bgen-bits 8}'.

If the format is not the expected one, this will result in an error or even
a crash of your R session. Another common source of error is due to corrupted
files; e.g. if using UK Biobank files, compare the result of \code{\link[tools:md5sum]{tools::md5sum()}}
with the ones at \url{https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=998}.

You can look at some example code from my papers on how to use this function:
\itemize{
\item \url{https://github.com/privefl/paper-misspec/blob/main/code/prepare-genotypes.R}
\item \url{https://github.com/privefl/paper-ldpred2/blob/master/code/prepare-genotypes.R#L1-L62}
\item \url{https://github.com/privefl/paper4-bedpca/blob/master/code/missing-values-UKBB.R#L34-L75}
}
}
