Identifies neighborhoods in TCR space containing more TCRs than expected by chance under a null model of independent VDJ rearrangement. For each TCR, counts how many other TCRs fall within a set of fixed TCRdist radii, and compares the observed count to the Poisson expectation derived from background distributions.
Usage
find_clumping(
tcr_df,
organism,
radii = c(24L, 48L, 72L, 96L),
num_random_samples = 50000L,
pvalue_threshold = 1,
verbose = TRUE,
clusters_gex = NULL,
bg_tcrs = NULL,
preserve_vj_pairings = FALSE
)Arguments
- tcr_df
A
data.framewith columnsva,ja,cdr3a,cdr3a_nucseq,vb,jb,cdr3b,cdr3b_nucseq.- organism
Character string. Organism key (e.g.
"human","mouse").- radii
Integer vector. TCRdist radii to test. Default
c(24L, 48L, 72L, 96L).- num_random_samples
Integer. Number of random background chains per chain type. Default
50000L.- pvalue_threshold
Numeric. Maximum adjusted p-value to include in results. Default
1.0(include all).- verbose
Logical. Print progress messages. Default
TRUE.- clusters_gex
Integer vector of length
nrow(tcr_df), orNULL. If provided, also tests for TCR clumps within each GEX cluster. DefaultNULL.- bg_tcrs
Optional data.frame. If provided, used for background generation instead of
tcr_df. DefaultNULL.- preserve_vj_pairings
Logical. Preserve V-J pairings in background resampling. Default
FALSE.
Value
A list with four elements:
results_dfA data.frame sorted by
pvalue_adjwith columns:clump_type,clone_index(0-based),nbr_radius,pvalue_adj,num_nbrs,expected_num_nbrs,raw_count,va,ja,cdr3a,vb,jb,cdr3b,clumping_group,clonotype_fdr_value.is_clumpedLogical vector of length
nrow(tcr_df).clustersInteger vector of length
nrow(tcr_df). 0 = not clumped, positive = cluster ID.all_raw_pvaluesNumeric matrix (
nrow(tcr_df)xlength(radii)).
Details
The pipeline:
Estimate per-TCR background frequency distributions via shuffled chain resampling.
Assign alpha/beta chain groups for same-chain masking.
Find all neighbors within
max(radii)usingtcrdist_radius_neighbors.Run Poisson tests at each radius (C++ via
rcpp_poisson_test_loop).Perform single-linkage clustering of significant clumps.
Examples
if (FALSE) { # \dontrun{
result <- find_clumping(tcr_df, "human")
result$results_df
sum(result$is_clumped)
} # }