Computes pairwise TCRdist distances for all pairs (i, j) where
i < j and the distance is at most threshold, then returns a
symmetric sparse matrix in compressed-column format (dgCMatrix).
Usage
tcrdist_sparse(
tcrs,
organism,
threshold,
components = "all",
weight_cdr3 = WEIGHT_CDR3_REGION,
gap_penalty_cdr3 = GAP_PENALTY_CDR3_REGION
)Arguments
- tcrs
A
data.framewith at least the following columns:vaCharacter. Alpha-chain V-gene allele.
cdr3aCharacter. Alpha-chain CDR3 amino acid sequence.
vbCharacter. Beta-chain V-gene allele.
cdr3bCharacter. Beta-chain CDR3 amino acid sequence.
- organism
Character string. Organism key, e.g.
"human".- threshold
Numeric. Maximum distance to include. Pairs with distance strictly greater than
thresholdare omitted. Must be >= 0.- components
Character. Which distance components to include. See
tcrdist_matrixfor details. Default"all".- weight_cdr3
Integer. CDR3 distance weight. Defaults to
WEIGHT_CDR3_REGION(3L).- gap_penalty_cdr3
Integer. CDR3 gap penalty. Defaults to
GAP_PENALTY_CDR3_REGION(12L).
Value
A symmetric sparse matrix of class dgCMatrix with dimensions
N x N. Off-diagonal entries (i, j) and (j, i) are present for all pairs
within the threshold. The diagonal is structural zero (not stored).
Returns an all-zero N x N dgCMatrix if no pairs satisfy the
threshold.
Details
Three-stage early termination is used internally:
If the V-region distance sum alone exceeds
threshold, skip.If V-region + CDR3-alpha distance exceeds
threshold, skip.If the full distance exceeds
threshold, skip.
For threshold = Inf all pairs are evaluated (equivalent to a full
dense matrix stored as sparse), which serves as a correctness check against
tcrdist_matrix.
Examples
# \donttest{
tcrs <- data.frame(
va = c("TRAV1-1*01", "TRAV1-1*01"),
cdr3a = c("CAVRDSSYKLIF", "CAVRDSSYKLIF"),
vb = c("TRBV19*01", "TRBV19*01"),
cdr3b = c("CASSIRSSYEQYF", "CASSIRSYEQYF"),
stringsAsFactors = FALSE
)
sp <- tcrdist_sparse(tcrs, "human", threshold = 50)
# }