Implements the first part of the YLL pipeline. Takes $D_L$ (deaths by infectious syndrome, a single scalar), disaggregates it into age x sex strata using observed proportions from the patient death cohort, and multiplies by the India life expectancy for each stratum to yield the base YLL block $\sum_x D_L^x \, e_*^x$ (Equation 10).

Usage

compute_base_yll_from_dl(
  dl,
  patient_data,
  patient_col,
  outcome_col,
  death_value = "Death",
  age_bin_col,
  sex_col,
  syndrome_col = NULL,
  syndrome_name = NULL,
  pathogen_col = NULL,
  facility_col = NULL,
  facility_name = NULL,
  use_sex = TRUE,
  le_path = system.file("extdata", "life_table_india.csv", package = "anumaan"),
  male_value = "Male",
  female_value = "Female",
  age_bin_map = c(`<1` = "0-1"),
  stratify_by = NULL
)

Arguments

dl: Numeric scalar. $D_L$, the number of deaths for syndrome L (e.g. 45.2 deaths per 1000 incidences).
patient_data: Data frame. Patient-level records (one row per patient x antibiotic test or per patient x pathogen). Used to derive observed age (x sex) proportions from the death cohort.
patient_col: Character. Unique patient identifier column in patient_data.
outcome_col: Character. Final-outcome column in patient_data.
death_value: Character (scalar or vector). Value(s) in outcome_col indicating a fatal outcome. Default "Death".
age_bin_col: Character. Column in patient_data containing GBD-standard age bin labels (e.g. "0-1", "1-5", ..., "85+"). Use age_bin_map to recode non-standard labels (default remaps "<1" -> "0-1").
sex_col: Character. Sex column in patient_data.
syndrome_col: Character or NULL. Syndrome column in patient_data (e.g. "infectious_syndrome"). Required when syndrome_name is not NULL.
syndrome_name: Character or NULL. If supplied, patient_data is filtered to rows where syndrome_col == syndrome_name before computing proportions and outputs (e.g. "Bloodstream infection"). NULL uses all syndromes.
pathogen_col: Character or NULL. Pathogen (organism) column. When supplied, per-pathogen age proportions are computed and the per_pathogen and by_pathogen_age_sex outputs are populated.
facility_col: Character or NULL. Facility identifier column. When supplied, per-facility proportions are computed and by_facility is populated.
facility_name: Character or NULL. If provided, filters patient_data to the specified facility before computation.
use_sex: Logical. Whether to disaggregate by sex as well as age (uses sex-specific life expectancy). Default TRUE.
le_path: Character. Path to the India life expectancy xlsx file. Defaults to the bundled inst/extdata copy.
male_value: Character. Value in sex_col for males. Default "Male".
female_value: Character. Value in sex_col for females. Default "Female". All other values map to "Combined".
age_bin_map: Named character vector. Remaps non-standard age bin labels before joining to the life table. Default c("<1" = "0-1").
stratify_by: Character vector or NULL. Additional columns from patient_data to include in the stratified output.

Value

A named list:

total: Scalar: total base YLL = $\sum_x D_L^x e_*^x$.
by_age_sex: Data frame: base YLL by age bin x sex.
per_pathogen: Data frame: base YLL per pathogen K, computed using each pathogen's own death age distribution (only when pathogen_col is supplied).
by_pathogen_age_sex: Data frame: base YLL by pathogen x age bin x sex (only when pathogen_col is supplied).
by_facility: Data frame: base YLL by facility (only when facility_col is supplied).
by_syndrome_pathogen: Data frame: base YLL by syndrome x pathogen (only when both syndrome_col and pathogen_col are supplied).
stratified: Data frame: base YLL aggregated by stratify_by (only when stratify_by is supplied).
disaggregated_dl: Data frame: the full expanded table with columns D_x_L ($D_L^x$), proportion ($\hat{p}_x$), life_expectancy ($e_*^x$), and yll_contribution ($D_L^x \times e_*^x$) for every age x sex stratum. This is the row-level audit trail.

Details

Formula: $$ \text{base\_YLL}_L = \sum_x D_L^x \, e_*^x $$ where $$D_L^x = D_L \times \hat{p}_x$$ and $\hat{p}_x$ is the observed proportion of deaths in age bin $x$ (within sex $s$ when use_sex = TRUE), estimated from patient_data (filtered to deaths, and optionally to syndrome_name).

D_L note: For 1000-incidence normalisation: $$D_L = 1000 \times \text{death\_rate}_L,\quad \text{death\_rate}_L = \frac{\#\text{deaths}\mid L}{\#\text{incidence}_L}$$ The caller supplies this scalar directly via dl.

Per-pathogen / per-facility YLL: age (x sex) proportions are re-computed within each subgroup so that each subgroup's YLL reflects its own age structure, scaled by the shared dl.

References

Bhaswati Ganguli. DALY Methodology for AMR (YLD notes). March 2026.

Antimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance in 2019. Lancet. 2022.

Examples

if (FALSE) { # \dontrun{
result <- compute_base_yll_from_dl(
  dl = 45.2,
  patient_data = bsi_data,
  outcome_col = "final_outcome",
  death_value = "Death",
  age_bin_col = "Age_bin",
  sex_col = "gender",
  syndrome_col = "infectious_syndrome",
  syndrome_name = "Bloodstream infection",
  pathogen_col = "organism_name",
  facility_col = "center_name",
  le_path = here::here(
    "anumaan", "inst", "extdata",
    "life_expectancy_all.xlsx"
  )
)

result$total
result$by_age_sex
result$per_pathogen
result$by_syndrome_pathogen
result$disaggregated_dl
} # }