Maps incoming dataset column names to standardized names used throughout the package. Supports exact matching and optional fuzzy matching for unmatched columns.
Usage
standardize_column_names(
data,
mapping = default_column_mappings,
fuzzy_match = TRUE,
fuzzy_threshold = 0.3,
interactive = FALSE
)Arguments
- data
A data frame with raw column names
- mapping
Named list where names are standard column names and values are character vectors of acceptable aliases. Default uses
default_column_mappings.- fuzzy_match
Logical. If TRUE, attempts fuzzy matching for unmapped columns using string distance. Default TRUE.
- fuzzy_threshold
Numeric. Maximum string distance (0-1) for fuzzy matching. Lower values require closer matches. Default 0.3.
- interactive
Logical. If TRUE and fuzzy matches found, prompts user for confirmation. Default FALSE (auto-accept).
Value
A list with components:
data: Data frame with standardized column names
mapping_log: List documenting which columns were mapped and how
unmapped: Character vector of columns that couldn't be mapped
Examples
if (FALSE) { # \dontrun{
raw_data <- data.frame(
PatientID = 1:10,
Organism = rep("E. coli", 10),
Drug = rep("Ampicillin", 10)
)
result <- standardize_column_names(raw_data)
clean_data <- result$data
} # }