Two input modes are supported:
Usage
compute_tcrdist_umap(
tcr_df = NULL,
organism = NULL,
pca_embeddings = NULL,
n_components = 2L,
n_neighbors = 15L,
min_dist = 0.1,
spread = 1,
metric = "euclidean",
n_threads = 1L,
seed = NULL,
cluster = FALSE,
clustering_resolution = 1,
clustering_method = NULL,
...
)Arguments
- tcr_df
Data.frame with TCR columns (
va,cdr3a,vb,cdr3b; plusja,jbfor group masking).- organism
Character string (
"human"or"mouse"). Required whentcr_dfis used.- pca_embeddings
Numeric matrix (N x D). Pre-computed kernel PCA embeddings. If provided, the PCA path is used.
- n_components
Integer. Number of UMAP output dimensions. Default
2L.- n_neighbors
Integer. Number of nearest neighbors. Default
15L.- min_dist
Numeric. UMAP min_dist. Default
0.1.- spread
Numeric. UMAP spread parameter. Default
1.0.- metric
Character. Distance metric for UMAP neighbor search (PCA path only). Default
"euclidean".- n_threads
Integer. Threads for UMAP optimization. Default
1L.- seed
Integer or
NULL. Random seed. DefaultNULL.- cluster
Logical. If
TRUE, run graph-based clustering on the KNN graph (KNN path only; requires igraph). DefaultFALSE.- clustering_resolution
Numeric. Resolution for community detection. Default
1.0.- clustering_method
Character or
NULL."leiden","louvain", orNULL(try Leiden first). DefaultNULL.- ...
Additional arguments passed to
umap(PCA path only).
Value
A named list. Elements depend on the input path:
KNN path (tcr_df + organism):
embeddingsNumeric matrix (N x
n_components).knn_indicesInteger matrix (N x K). 1-based.
knn_distancesNumeric matrix (N x K).
knn_graphSparse
dgCMatrix(N x N). Fuzzy simplicial set.clustersInteger vector (0-based) or
NULL.nndistsNumeric vector. Weighted NN distances.
n_neighborsFinal K (may have been increased for connectivity).
n_componentsInteger.
method"knn".
PCA path (pca_embeddings):
embeddingsNumeric matrix (N x
n_components).pca_embeddingsThe PCA input matrix.
n_componentsInteger.
method"pca".
Details
KNN path (recommended): supply
tcr_dfandorganism. Computes TCRdist K-nearest-neighbors with group masking (clones sharing an identical alpha or beta chain are excluded from each other's neighborhoods), builds a fuzzy simplicial set graph, and runs UMAP from the precomputed KNN viaumap. This preserves the TCRdist metric faithfully.PCA path: supply
pca_embeddings(an N x D matrix, e.g. fromcompute_tcrdist_kernel_pca()$embeddings). Runs standard UMAP in Euclidean space on the PCA coordinates.
When cluster = TRUE, Leiden (or Louvain) community detection is
performed on the fuzzy KNN graph (KNN path only; requires igraph).
Examples
# \donttest{
data(dash)
sub <- dash[1:200, ]
# KNN path (recommended): uses TCRdist directly
umap <- compute_tcrdist_umap(sub, "mouse", seed = 42)
dim(umap$embeddings) # 200 x 2
#> [1] 200 2
# With clustering
umap <- compute_tcrdist_umap(sub, "mouse", seed = 42, cluster = TRUE)
table(umap$clusters)
#>
#> 0 1 2 3 4 5 6 7 8 9 10
#> 27 9 22 35 11 9 21 28 14 16 8
# PCA path: from pre-computed kernel PCA
pca <- compute_tcrdist_kernel_pca(sub, "mouse", n_components = 20L)
umap <- compute_tcrdist_umap(pca_embeddings = pca$embeddings, seed = 42)
# }