Skip to contents

Constructs a TCRrep S4 object from a clonotype data frame and optional parameters. This is the recommended way to create a TCRrep instance (following Bioconductor convention, rather than calling new("TCRrep", ...) directly).

Usage

TCRrep(
  clone_df,
  organism = "human",
  chains = "AB",
  deduplicate = TRUE,
  metric = "tcrdist",
  compute_distances = FALSE,
  weight_cdr3 = WEIGHT_CDR3_REGION,
  gap_penalty_cdr3 = GAP_PENALTY_CDR3_REGION,
  weight_v_region = WEIGHT_V_REGION,
  gap_penalty_v_region = GAP_PENALTY_V_REGION
)

Arguments

clone_df

A data.frame of clonotypes. Required columns depend on the chains argument:

"AB"

Requires va, cdr3a, vb, cdr3b.

"A"

Requires va, cdr3a.

"B"

Requires vb, cdr3b.

"GD"

Requires va, cdr3a, vb, cdr3b.

organism

Character string. Organism key recognised by load_gene_database, e.g. "human" or "mouse". For gamma-delta TCRs use "human_gd" or "mouse_gd".

chains

Character string. One of "AB" (default), "A", "B", or "GD".

deduplicate

Controls clone deduplication (matching tcrdist3 behavior). Chain columns are always included in grouping automatically.

TRUE (default)

Deduplicate using chain columns plus subject if present. Within-subject duplicates are merged and count values summed.

FALSE

No deduplication; clone_df is stored as-is.

Character vector

Additional grouping columns beyond the chain columns. For example, c("subject") groups by chain columns + subject (same as default when subject exists); character(0) groups by chain columns only.

metric

Character string. Distance metric to use. One of "tcrdist" (default) or "hamming".

compute_distances

Logical. If TRUE and nrow(clone_df) > 0, compute the pairwise distance matrix immediately and store it in the paired_dist slot. Defaults to FALSE.

weight_cdr3

Integer. Weight applied to CDR3 distances. Defaults to WEIGHT_CDR3_REGION (3L).

gap_penalty_cdr3

Integer. Gap penalty for CDR3 alignments. Defaults to GAP_PENALTY_CDR3_REGION (12L).

weight_v_region

Integer. Weight applied to V-region distances. Defaults to WEIGHT_V_REGION (1L).

gap_penalty_v_region

Integer. Gap penalty for V-region alignments. Defaults to GAP_PENALTY_V_REGION (4L).

Value

A valid TCRrep S4 object.

Details

Factor columns (va, cdr3a, vb, cdr3b) are automatically coerced to character. The organism is validated against the bundled gene database on construction.

Gamma-delta TCRs. For gamma-delta TCRs, use chains = "GD" and specify the gamma-delta organism database: organism = "human_gd" or organism = "mouse_gd". Gamma chain genes map to the va/cdr3a columns; delta chain genes map to the vb/cdr3b columns.

Examples

# \donttest{
tcrs <- data.frame(
    va    = c("TRAV1-1*01", "TRAV1-1*01"),
    cdr3a = c("CAVRDSSYKLIF", "CAVRDSSYKLIF"),
    vb    = c("TRBV19*01", "TRBV19*01"),
    cdr3b = c("CASSIRSSYEQYF", "CASSIRSYEQYF"),
    stringsAsFactors = FALSE
)

# Basic construction (deduplicates by default)
obj <- TCRrep(tcrs, organism = "human")

# With distance computation
obj <- TCRrep(tcrs, organism = "human", compute_distances = TRUE)
dim(obj@paired_dist)  # 2 x 2
#> [1] 2 2

# Chain columns only (collapse across subjects)
obj <- TCRrep(tcrs, organism = "human", deduplicate = character(0))

# No deduplication
obj <- TCRrep(tcrs, organism = "human", deduplicate = FALSE)
# }