Skip to contents

For each pathogen \(k\) with \(n_k\) antibiotic classes, enumerates all \(2^{n_k}\) binary resistance profiles \(\delta \in \{0,1\}^{n_k}\) and estimates their probabilities by solving a simplex-constrained weighted least-squares Quadratic Programme (GBD equation 7.5.1.3):

Usage

compute_resistance_profiles(
  marginal_output,
  coresistance_output,
  pathogens = NULL,
  top_n_pathogens = NULL,
  exclude_near_zero = TRUE,
  top_n_classes = NULL,
  sigma_sq = 1,
  ridge = 1e-08,
  pathogen_col = "organism_name",
  antibiotic_class_col = "antibiotic_class",
  facility_col = NULL,
  facility_name = NULL,
  outcome_col = NULL,
  outcome_value = NULL,
  n_cores = 1L
)

Arguments

marginal_output

List returned by compute_marginal_resistance().

coresistance_output

List returned by compute_pairwise_coresistance().

pathogens

Character vector. Pathogen(s) to process. NULL (default) processes every pathogen in marginal_output.

top_n_pathogens

Integer or NULL (default). When set, only the top top_n_pathogens pathogens ranked by total isolates tested (sum of n_tested across all antibiotic classes, descending) are processed. Applied after the pathogens argument filter. Useful for focusing on the most data-rich pathogens – e.g. top_n_pathogens = 5 runs profiles for only the 5 most-tested pathogens.

exclude_near_zero

Logical. If TRUE (default), antibiotic classes that appear in marginal_output$near_zero for a given pathogen are excluded from profile enumeration.

top_n_classes

Integer or NULL (default). When set, only the top top_n_classes antibiotic classes ranked by n_tested (descending) are kept per pathogen before profile enumeration. Useful for capping the combinatorial explosion (2^n profiles) for pathogens tested against many drug classes – e.g. top_n_classes = 5 gives at most 32 profiles. Applied after exclude_near_zero.

sigma_sq

Positive numeric. Assumed variance for each constraint (uniform). Default 1.

ridge

Positive numeric. Ridge term added to the QP Hessian for numerical stability. Default 1e-8.

pathogen_col

Character. Column name for pathogens. Must match the column used in Steps 1-2. Default "organism_name".

antibiotic_class_col

Character. Column name for antibiotic classes. Default "antibiotic_class".

facility_col

Character or NULL. Column identifying the facility/site. When provided together with facility_name, filters marginal_output$marginal and marginal_output$near_zero to the specified facility before profile enumeration. For this to work, compute_marginal_resistance() must have been called with the same facility_col argument. Both must be supplied together or both left NULL. Default NULL.

facility_name

Character or NULL. Facility value to retain. Default NULL.

outcome_col

Character or NULL. Column containing patient outcomes. When provided together with outcome_value, filters marginal_output$marginal and marginal_output$near_zero to the specified outcome before profile enumeration. compute_marginal_resistance() must have been called with the same outcome_col argument. Both must be supplied together or both left NULL. Default NULL.

outcome_value

Character or NULL. Outcome value to retain (e.g. "discharged", "dead"). Default NULL.

n_cores

Integer. Number of CPU cores for parallel computation. Default 1L (sequential).

Value

Named list, one entry per pathogen, each a list with:

profiles

Data frame: profile (character label, e.g.\ "RSR"), probability (\(\hat{p}_\delta\)), and one binary (0/1) integer column per antibiotic class indicating whether that class is resistant (1) or susceptible (0) in each profile.

classes

Character vector of antibiotic class names used (alphabetical; bit 0 = classes[1]).

n_classes

Integer. Number of classes used.

constraint_residuals

Named numeric vector of \(m_i^\top \hat{p} - v_i\) for each constraint. Small absolute values indicate good constraint satisfaction.

Details

$$ \hat{p} = \arg\min_{p \in \Delta_{2^n}} \sum_{i=1}^{m} \frac{(m_i^\top p - v_i)^2}{\sigma_i^2} $$

where \(m = n(n+1)/2\) data-derived linear constraints encode n marginal resistance rates and n(n-1)/2 pairwise co-resistance rates, and \(\Delta\) is the standard probability simplex.

Constraint rows in M

Marginal (rows 1 to n)

Row \(d\): \(M_{d,\delta} = 1\) iff \(\delta_d = 1\) (i.e., class \(d\) is resistant in profile \(\delta\)). Constraint: \(\sum_\delta M_{d,\delta}\,p_\delta = \hat{r}_{kd}\).

Pairwise (rows n+1 to m)

Row \((d_1,d_2)\): \(M_{d_1 d_2,\delta} = 1\) iff \(\delta_{d_1} = 1 \land \delta_{d_2} = 1\). Constraint: \(\sum_\delta M_{d_1 d_2,\delta}\,p_\delta = \hat{r}_{k,d_1 d_2}\). When a pairwise estimate is unavailable (too few co-tested isolates), the product of marginals (independence assumption) is used as fallback.

The QP is solved via quadprog::solve.QP. A small ridge term (ridge) is added to the Hessian to guarantee strict positive-definiteness. On solver failure the pathogen gets a uniform distribution over all profiles.

Performance Notes

This function has been optimized for speed with vectorized profile generation and label creation (10-100x faster than previous versions). However, computational complexity is still exponential in the number of classes:

  • n <= 14: Fast (seconds to minutes)

  • n = 15: Moderate (minutes)

  • n >= 16: Slow and memory-intensive (use top_n_classes)

For large datasets with many pathogens, consider using top_n_classes to limit each pathogen to its most-tested classes (e.g., top_n_classes = 12).

Examples

if (FALSE) { # \dontrun{
marg <- compute_marginal_resistance(amr_clean)
co_res <- compute_pairwise_coresistance(marg)
rp <- compute_resistance_profiles(marg, co_res)

# All profiles and probabilities for K. pneumoniae
rp[["Klebsiella pneumoniae"]]$profiles

# Constraint residuals (quality check)
rp[["Klebsiella pneumoniae"]]$constraint_residuals

# Single pathogen
rp_kp <- compute_resistance_profiles(
  marg, co_res,
  pathogens = "Klebsiella pneumoniae"
)
} # }