K-nearest-neighbors by TCRdist with optional group masking
Source:R/wrappers_neighbors.R
tcrdist_knn.RdFor each of the N input TCRs, finds the K nearest neighbors by TCRdist
distance. TCRs sharing the same agroups or bgroups value are
excluded from each other's neighborhood (same-group masking). When
agroups and bgroups are NULL (default), every TCR has
its own unique group so no masking occurs.
Usage
tcrdist_knn(
tcrs,
organism,
K,
agroups = NULL,
bgroups = NULL,
sort_nbrs = TRUE,
components = "all",
weight_cdr3 = WEIGHT_CDR3_REGION,
gap_penalty_cdr3 = GAP_PENALTY_CDR3_REGION
)Arguments
- tcrs
A
data.framewith at least the following columns:vaCharacter. Alpha-chain V-gene allele.
cdr3aCharacter. Alpha-chain CDR3 amino acid sequence.
vbCharacter. Beta-chain V-gene allele.
cdr3bCharacter. Beta-chain CDR3 amino acid sequence.
- organism
Character string. Organism key, e.g.
"human".- K
Integer. Number of nearest neighbors to return per TCR. Must satisfy
1 <= K <= N - 1.- agroups
Integer vector of length N, or
NULL. Alpha-chain group assignments. TCRs with the same value are masked from each other.NULLassigns each TCR a unique group (no masking).- bgroups
Integer vector of length N, or
NULL. Beta-chain group assignments. Same semantics asagroups.- sort_nbrs
Logical. If
TRUE(default), sort each row's K neighbors by ascending distance.- components
Character. Which distance components to include. See
tcrdist_matrixfor details. Default"all".- weight_cdr3
Integer. CDR3 distance weight. Defaults to
WEIGHT_CDR3_REGION(3L).- gap_penalty_cdr3
Integer. CDR3 gap penalty. Defaults to
GAP_PENALTY_CDR3_REGION(12L).
Value
A list with two elements:
knn_indicesInteger matrix (N x K). 1-based row indices of the K nearest neighbors for each TCR.
knn_distancesNumeric matrix (N x K). Corresponding TCRdist distances.
Examples
# \donttest{
tcrs <- data.frame(
va = c("TRAV1-1*01", "TRAV1-1*01", "TRAV12-2*01"),
cdr3a = c("CAVRDSSYKLIF", "CAVRDSSYKLIF", "CAVSANSGTYF"),
vb = c("TRBV19*01", "TRBV19*01", "TRBV20-1*01"),
cdr3b = c("CASSIRSSYEQYF", "CASSIRSYEQYF", "CSARDRTGNTIYF"),
stringsAsFactors = FALSE
)
knn <- tcrdist_knn(tcrs, "human", K = 1L)
# }