Constructs a TCRrep S4 object from a clonotype data frame and
optional parameters. This is the recommended way to create a TCRrep
instance (following Bioconductor convention, rather than calling
new("TCRrep", ...) directly).
Usage
TCRrep(
clone_df,
organism = "human",
chains = "AB",
deduplicate = TRUE,
metric = "tcrdist",
compute_distances = FALSE,
weight_cdr3 = WEIGHT_CDR3_REGION,
gap_penalty_cdr3 = GAP_PENALTY_CDR3_REGION,
weight_v_region = WEIGHT_V_REGION,
gap_penalty_v_region = GAP_PENALTY_V_REGION
)Arguments
- clone_df
A
data.frameof clonotypes. Required columns depend on thechainsargument:"AB"Requires
va,cdr3a,vb,cdr3b."A"Requires
va,cdr3a."B"Requires
vb,cdr3b."GD"Requires
va,cdr3a,vb,cdr3b.
- organism
Character string. Organism key recognised by
load_gene_database, e.g."human"or"mouse". For gamma-delta TCRs use"human_gd"or"mouse_gd".- chains
Character string. One of
"AB"(default),"A","B", or"GD".- deduplicate
Controls clone deduplication (matching tcrdist3 behavior). Chain columns are always included in grouping automatically.
TRUE(default)Deduplicate using chain columns plus
subjectif present. Within-subject duplicates are merged andcountvalues summed.FALSENo deduplication;
clone_dfis stored as-is.- Character vector
Additional grouping columns beyond the chain columns. For example,
c("subject")groups by chain columns + subject (same as default when subject exists);character(0)groups by chain columns only.
- metric
Character string. Distance metric to use. One of
"tcrdist"(default) or"hamming".- compute_distances
Logical. If
TRUEandnrow(clone_df) > 0, compute the pairwise distance matrix immediately and store it in thepaired_distslot. Defaults toFALSE.- weight_cdr3
Integer. Weight applied to CDR3 distances. Defaults to
WEIGHT_CDR3_REGION(3L).- gap_penalty_cdr3
Integer. Gap penalty for CDR3 alignments. Defaults to
GAP_PENALTY_CDR3_REGION(12L).- weight_v_region
Integer. Weight applied to V-region distances. Defaults to
WEIGHT_V_REGION(1L).- gap_penalty_v_region
Integer. Gap penalty for V-region alignments. Defaults to
GAP_PENALTY_V_REGION(4L).
Details
Factor columns (va, cdr3a, vb, cdr3b) are
automatically coerced to character. The organism is validated against the
bundled gene database on construction.
Gamma-delta TCRs.
For gamma-delta TCRs, use chains = "GD" and specify the gamma-delta
organism database: organism = "human_gd" or organism = "mouse_gd".
Gamma chain genes map to the va/cdr3a columns; delta chain
genes map to the vb/cdr3b columns.
Examples
# \donttest{
tcrs <- data.frame(
va = c("TRAV1-1*01", "TRAV1-1*01"),
cdr3a = c("CAVRDSSYKLIF", "CAVRDSSYKLIF"),
vb = c("TRBV19*01", "TRBV19*01"),
cdr3b = c("CASSIRSSYEQYF", "CASSIRSYEQYF"),
stringsAsFactors = FALSE
)
# Basic construction (deduplicates by default)
obj <- TCRrep(tcrs, organism = "human")
# With distance computation
obj <- TCRrep(tcrs, organism = "human", compute_distances = TRUE)
dim(obj@paired_dist) # 2 x 2
#> [1] 2 2
# Chain columns only (collapse across subjects)
obj <- TCRrep(tcrs, organism = "human", deduplicate = character(0))
# No deduplication
obj <- TCRrep(tcrs, organism = "human", deduplicate = FALSE)
# }