Converts a data.frame from common TCR analysis tools to tcrdistR's canonical
column names (va, cdr3a, vb, cdr3b). Supports
auto-detection of column naming patterns from scRepertoire, scirpy,
dandelion, and tcrdist3, as well as custom column mappings.
Usage
as_tcr_df(
df,
col_map = NULL,
format = NULL,
normalize_genes = TRUE,
drop_incomplete = TRUE
)Arguments
- df
A
data.framecontaining TCR data.- col_map
Named character vector mapping tcrdistR canonical names to the column names in
df. For example,c(va = "alpha_v_gene", cdr3a = "alpha_cdr3"). IfNULL, the format is auto-detected.- format
Character string or
NULL. Force a specific format instead of auto-detecting:"screpertoire"scRepertoire
CTgene/CTaaconcatenated columns."scirpy","dandelion"scirpy/dandelion
IR_VJ_1_*/IR_VDJ_1_*columns."tcrdist3"tcrdist3
v_a_gene/cdr3_a_aacolumns.
If
NULL(default), auto-detection is used.- normalize_genes
Logical. If
TRUE(default), gene names without an allele suffix get*01appended.- drop_incomplete
Logical. If
TRUE(default), rows withNAor empty strings in required chain columns (va,cdr3a,vb,cdr3b) are dropped with a message.
Value
A data.frame with tcrdistR canonical column names. Extra
columns from the input are preserved.
Examples
# Custom column mapping
df <- data.frame(
alpha_v = "TRAV1-1", alpha_cdr3 = "CAVRDSSYKLIF",
beta_v = "TRBV5-1", beta_cdr3 = "CASSIRSSYEQYF"
)
as_tcr_df(df, col_map = c(va = "alpha_v", cdr3a = "alpha_cdr3",
vb = "beta_v", cdr3b = "beta_cdr3"))
#> as_tcr_df: detected format 'custom'
#> va cdr3a vb cdr3b
#> 1 TRAV1-1*01 CAVRDSSYKLIF TRBV5-1*01 CASSIRSSYEQYF
# scirpy / dandelion format (auto-detected)
df <- data.frame(
IR_VJ_1_v_call = "TRAV1-1", IR_VJ_1_junction_aa = "CAVRDSSYKLIF",
IR_VDJ_1_v_call = "TRBV5-1", IR_VDJ_1_junction_aa = "CASSIRSSYEQYF"
)
as_tcr_df(df)
#> as_tcr_df: detected format 'scirpy'
#> va cdr3a vb cdr3b
#> 1 TRAV1-1*01 CAVRDSSYKLIF TRBV5-1*01 CASSIRSSYEQYF
# scRepertoire format (auto-detected)
df <- data.frame(
CTgene = "TRAV1-1.TRAJ33.TRAC_TRBV5-1.None.TRBJ2-7.TRBC2",
CTaa = "CAVRDSSYKLIF_CASSIRSSYEQYF"
)
as_tcr_df(df)
#> as_tcr_df: detected format 'screpertoire'
#> va ja cdr3a vb jb cdr3b
#> 1 TRAV1-1*01 TRAJ33*01 CAVRDSSYKLIF TRBV5-1*01 TRBJ2-7*01 CASSIRSSYEQYF