Removes duplicate rows within groups. Two modes:
Usage
prep_deduplicate_events(
data,
event_col = "event_id",
organism_col = "organism_normalized",
antibiotic_col = "antibiotic_normalized",
key_cols = NULL,
keep = "first"
)Arguments
- data
Data frame.
- event_col
Character. Event ID column (event-aware mode). Default "event_id".
- organism_col
Character. Organism column (event-aware mode). Default "organism_normalized".
- antibiotic_col
Character. Antibiotic column (event-aware mode). Default "antibiotic_normalized".
- key_cols
Character vector. When supplied, switches to generic mode and uses these columns for duplicate detection. NULL uses event-aware mode. Default NULL.
- keep
Character. "first", "last", or "none" (drop all duplicates). Default "first".
Details
- Event-aware (default)
Groups by
event_col+organism_col+antibiotic_coland keeps first/last antibiotic test per group. Requiresevent_colto be present.- Generic (key_cols supplied)
Groups by
key_colsand drops duplicates. Whenkeep = "none", both copies of every duplicate are removed. Replaces the formerremove_duplicate_rows()helper.