Skip to contents

Identifies and removes exact duplicate rows from the dataset. Optionally keeps first or last occurrence.

Usage

remove_duplicate_rows(data, keep = "first", subset = NULL, report = TRUE)

Arguments

data

Data frame

keep

Character. Which duplicate to keep: "first" (default), "last", or "none".

subset

Character vector. Column names to check for duplicates. If NULL, checks all columns. Default NULL.

report

Logical. If TRUE, prints detailed duplicate report. Default TRUE.

Value

Data frame with duplicates removed

Examples

if (FALSE) { # \dontrun{
# Remove exact duplicates (all columns)
clean_data <- remove_duplicate_rows(data)

# Remove duplicates based on specific columns
clean_data <- remove_duplicate_rows(
  data,
  subset = c("patient_id", "date_of_culture", "organism_normalized")
)
} # }