% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract_ip_features.R
\name{extract_ip_features}
\alias{extract_ip_features}
\title{Extract IP Address Features}
\usage{
extract_ip_features(ip_addresses, error_on_invalid = FALSE)
}
\arguments{
\item{ip_addresses}{A character vector of IP address strings.}

\item{error_on_invalid}{Logical flag indicating how to handle invalid IP addresses. If \code{TRUE}, the function throws an error upon encountering any invalid IP address; if \code{FALSE} (the default), invalid IP addresses are replaced with \code{NA} and a warning is issued.}
}
\value{
A data frame with the following columns:
\describe{
  \item{\code{ip_version}}{A character vector indicating the IP version; either \code{"IPv4"} or \code{"IPv6"}. Invalid addresses are set to \code{NA}.}
  \item{\code{ip_v4_octet1}}{The numeric conversion of the first octet of an IPv4 address as extracted from the IP string.}
  \item{\code{ip_v4_octet2}}{The numeric conversion of the second octet of an IPv4 address.}
  \item{\code{ip_v4_octet3}}{The numeric conversion of the third octet of an IPv4 address.}
  \item{\code{ip_v4_octet4}}{The numeric conversion of the fourth octet of an IPv4 address.}
  \item{\code{ip_v4_octet1_has_leading_zero}}{An integer flag indicating whether the first octet of an IPv4 address includes a leading zero.}
  \item{\code{ip_v4_octet2_has_leading_zero}}{An integer flag indicating whether the second octet includes a leading zero.}
  \item{\code{ip_v4_octet3_has_leading_zero}}{An integer flag indicating whether the third octet includes a leading zero.}
  \item{\code{ip_v4_octet4_has_leading_zero}}{An integer flag indicating whether the fourth octet includes a leading zero.}
  \item{\code{ip_leading_zero_count}}{An integer count of how many octets in an IPv4 address contain leading zeros.}
  \item{\code{ip_v4_numeric_vector}}{The 32-bit integer representation of an IPv4 address, computed as \eqn{(A * 256^3) + (B * 256^2) + (C * 256) + D}.}
  \item{\code{ip_v6_numeric_approx_vector}}{An approximate numeric conversion of an IPv6 address. This value is computed from the eight hextets and is intended for interval comparisons only; precision may be lost for large values (above 2^53).}
  \item{\code{ip_is_palindrome}}{An integer value indicating whether the entire IP address string is a palindrome (i.e., it reads the same forwards and backwards).}
  \item{\code{ip_entropy}}{A numeric value representing the Shannon entropy of the IP address string, computed over the distribution of its characters. Higher entropy values indicate a more varied (less repetitive) pattern.}
}
}
\description{
This function extracts a comprehensive set of features from a vector of IP address strings to support feature engineering in credit-scoring datasets. It processes both IPv4 and IPv6 addresses and returns a data frame with derived features. The features include IP version classification, octet-level breakdown for IPv4 addresses (with both string‐ and numeric-based octets), checks for leading zeros, a numeric conversion of the address, a basic approximation of IPv6 numeric values, pattern metrics such as a palindrome check and Shannon entropy, multicast status, and a Hilbert curve encoding for IPv4 addresses.
}
\details{
The function follows these steps:
\itemize{
  \item \strong{Validation:} Each IP address is checked against regular expressions for both IPv4 and IPv6. If an IP does not match either pattern, it is deemed invalid. Depending on the value of \code{error_on_invalid}, invalid entries are either replaced with \code{NA} (with a warning) or cause an error.
  \item \strong{IP Version Identification:} The function determines whether an IP address is IPv4 or IPv6.
  \item \strong{IPv4 Feature Extraction:}
    \itemize{
      \item The IPv4 addresses are split into four octets.
      \item For each octet, both the raw (string) and numeric representations are extracted.
      \item The presence of leading zeros is checked for each octet, and the total count of octets with leading zeros is computed.
      \item The full IPv4 address is converted to a 32-bit numeric value.
      \item Hilbert curve encoding is applied to the numeric value, yielding two dimensions that can be used as features in modeling.
    }
  \item \strong{IPv6 Feature Extraction:} For IPv6 addresses, an approximate numeric conversion is performed to allow for coarse interval analysis.
  \item \strong{Pattern Metrics:} Independent of IP version, the function computes:
    \itemize{
      \item A palindrome check on the entire IP string.
      \item The Shannon entropy of the IP string to capture the diversity of characters.
    }
}
}
\examples{
# Load the package's sample dataset
data(featForge_sample_data)

# Extract IP features and combine them with the original IP column
result <- cbind(
  data.frame(ip = featForge_sample_data$ip),
  extract_ip_features(featForge_sample_data$ip)
)
print(result)

}
