Alignment¶
The align command aligns raw sequencing reads to reference V, D, J and C genes of T- and B- cell receptors. It has the following syntax:
mixcr align [options] input_file1 [input_file2] output_file.vdjca
MiXCR supports fasta, fastq, fastq.gz and paired-end fastq and fastq.gz input. In case of paired-end reads two input files should be specified.
Command line parameters¶
The following table contains description of command line options for align:
| Option | Default value | Description |
|---|---|---|
-h, --help |
Print help message. | |
-r {file} --report ... |
Report file name. If this option is not specified, no report file be produced. | |
-с {chain} --chains ... |
ALL |
Target immunological chain list separated by “,”.
Available values: IGH, IGL, IGK, TRA,
TRB, TRG, TRD, IG (for all immunoglobulin
chains), TCR (for all T-cell receptor chains), ALL
(for all chains) . It is highly recomended to use
the default value for this parameter in most cases
at the align step. Filltering is also possible at the
export step. |
-s {speciesName} --species ... |
HomoSapiens |
Species (organism). Possible values: hsa (or
HomoSapiens) and mmu (or MusMusculus), or any
that was provided during import of segments (see
import segments) |
-p {parameterName} --parameters ... |
default |
Preset of parameters. Possible values: default and
rna-seq. The rna-seq preset are specifically
optimized for analysis of Rna-Seq data
(see below) |
-i, --diff-loci |
Accept alignments with different loci of V and J genes (by default such alignments are dropped). | |
-t {numberOfThreads} --threads ... |
number of available CPU cores | Number of processing threads. |
-n {numberOfReads} --limit ... |
Limit number of sequences that will be analysed (only
first -n sequences will be processed from input
file(s)). |
|
-a, --save-description |
Copy read(s) description line from .fastq or
.fasta to .vdjca file (can be then exported with
-descrR1 and -descrR2 options in
exportAlignments action). |
|
-v, --write-all |
Write alignment results for all input reads: including
empty results for non-aligned reads. This option also turns
off “same locus filter”, so --diff-loci has no effect
if this option is specified. |
|
-g, --save-reads |
Copy read(s) from .fastq or .fasta to .vdjca
file (this is required for exporting reads aggregated by
clones; see this section). |
|
--not-aligned-R1 |
Write all not aligned reads (R1) to the specified file. | |
--not-aligned-R2 |
Write all not aligned reads (R) to the specified file. | |
-Oparameter=value |
Overrides default value of aligner parameter
(see next subsection). |
All parameters are optional.
Aligner parameters¶
MiXCR uses a wide range of parameters that controls aligner behaviour. There are some global parameters and gene-specific parameters organized in groups: vParameters, dParameters, jParameters and cParameters. Each group of parameters may contain further subgroups of parameters etc. In order to override some parameter value one can use -O followed by fully qualified parameter name and parameter value (e.g. -Ogroup1.group2.parameter=value).
One of the key MiXCR features is ability to specify particular gene regions which will be extracted from reference and used as a targets for alignments. Thus, each sequencing read will be aligned to these extracted reference regions. Parameters responsible for target gene regions are:
| Parameter | Default value | Description |
|---|---|---|
vParameters.geneFeatureToAlign |
VRegion |
region in V gene which will be used as target in align |
dParameters.geneFeatureToAlign |
DRegion |
region in D gene which will be used as target in align |
jParameters.geneFeatureToAlign |
JRegion |
region in J gene which will be used as target in align |
cParameters.geneFeatureToAlign |
CExon1 |
region in C gene which will be used as target in align |
It is important to specify these gene regions such that they will fully cover target clonal gene region which will be used in assemble (e.g. CDR3).
One can override default gene regions in the following way:
mixcr align -OvParameters.geneFeatureToAlign=VTranscript input_file1 [input_file2] output_file.vdjca
Other global aligner parameters are:
| Parameter | Default value | Description |
|---|---|---|
minSumScore |
120.0 |
Minimal total alignment score value of V and J genes. |
maxHits |
5 |
Maximal number of hits for each gene type: if input sequence align to more than
maxHits targets, then only top maxHits hits will be kept. |
minimalClonalSequenceLength |
12 |
Minimal clonal sequence length (e.g. minimal sequence of CDR3 to be used for clone assembly) |
vjAlignmentOrder
(only for single-end
analysis) |
VThenJ |
Order in which V and J genes aligned in target (possible values JThenV and
VThenJ). Parameter affects only single-read alignments and alignments of
overlapped paired-end reads. Non-overlaping paired-end reads are always processed in
VThenJ mode. JThenV can be used for short reads (~100bp) with full (or nearly
full) J gene coverage. |
relativeMinVFR3CDR3Score
(only for paired-end
analysis) |
0.7 |
Relative minimal alignment score of FR3+VCDR3Part region for V gene. V hit will
be kept only if its FR3+VCDR3Part part aligns with score greater than
relativeMinVFR3CDR3Score * maxFR3CDR3Score, where maxFR3CDR3Score is the
maximal alignment score for FR3+VCDR3Part region among all of V hits for current
input reads pair. |
readsLayout
(only for paired-end
analysis) |
Opposite |
Relative orientation of paired reads. Available values: Opposite, Collinear,
Unknown. |
One can override these parameters in the following way:
mixcr align -OmaxHits=3 input_file1 [input_file2] output_file.vdjca
V, J and C aligners parameters¶
MiXCR uses same types of aligners to align V, J and C genes (KAligner from MiLib; the idea of KAligner is inspired by this article). These parameters are placed in parameters subgroup and can be overridden using e.g. -OjParameters.parameters.mapperKValue=7. The following parameters for V, J and C aligners are available:
| Parameter | Default V value | Default J value | Default C value | Description |
|---|---|---|---|---|
mapperKValue |
5 |
5 |
5 |
Length of seeds used in aligner. |
floatingLeftBound |
true |
true |
false |
Specifies whether left bound of alignment is fixed or float: if
floatingLeftBound set to false, the left bound of either target
or query will be aligned. Default values are suitable in most cases. |
floatingRightBound |
true |
true |
false |
Specifies whether right bound of alignment is fixed or float:
if floatingRightBound set to false, the right bound of either
target or query will be aligned. Default values are suitable in most
cases. If your target molecules have no primer sequences in J Region
(e.g. library was amplified using primer to the C region) you can
change value of this parameter for J gene to false to increase
J gene identification accuracy and overall specificity of alignments. |
minAlignmentLength |
15 |
15 |
15 |
Minimal length of aligned region. |
maxAdjacentIndels |
2 |
2 |
2 |
Maximum number of indels between two seeds. |
absoluteMinScore |
40.0 |
40.0 |
40.0 |
Minimal score of alignment: alignments with smaller score will be dropped. |
relativeMinScore |
0.87 |
0.87 |
0.87 |
Minimal relative score of alignments: if alignment score is smaller than
relativeMinScore * maxScore, where maxScore is the best score
among all alignments for particular gene type (V, J or C) and input
sequence, it will be dropped. |
maxHits |
7 |
7 |
7 |
Maximal number of hits: if input sequence align with more than maxHits
queries, only top maxHits hits will be kept. |
These parameters can be overridden like in the following example:
mixcr align -OvParameters.parameters.minAlignmentLength=30 \
-OjParameters.parameters.relativeMinScore=0.7 \
input_file1 [input_file2] output_file.vdjca
Scoring used in aligners is specified by scoring subgroup of
parameters. It contains the following parameters:
| Parameter | Default value | Description |
|---|---|---|
subsMatrix |
|
Substitution matrix. Available types:
|
gapPenalty |
-12 |
Penalty for gap. |
Scoring parameters can be overridden in the following way:
mixcr align -OvParameters.parameters.scoring.gapPenalty=-20 input_file1 [input_file2] output_file.vdjca
mixcr align -OvParameters.parameters.scoring.subsMatrix=simple(match=4,mismatch=-11) \
input_file1 [input_file2] output_file.vdjca
D aligner parameters¶
The following parameters can be overridden for D aligner:
| Parameter | Default value | Description |
|---|---|---|
absoluteMinScore |
30.0 |
Minimal score of alignment: alignments with smaller scores will be dropped. |
relativeMinScore |
0.85 |
Minimal relative score of alignment: if alignment score is smaller than
relativeMinScore * maxScore, where maxScore is the best score among all alignments
for particular sequence, it will be dropped. |
maxHits |
3 |
Maximal number of hits: if input sequence align with more than maxHits queries, only top
maxHits hits will be kept. |
One can override these parameters like in the following example:
mixcr align -OdParameters.absoluteMinScore=10 input_file1 [input_file2] output_file.vdjca
Scoring parameters for D aligner are the following:
| Parameter | Default value | Description |
|---|---|---|
type |
affine |
Type of scoring. Possible values: affine, linear. |
subsMatrix |
|
Substitution matrix. Available types:
|
gapOpenPenalty |
-10 |
Penalty for gap opening. |
gapExtensionPenalty |
-1 |
Penalty for gap extension. |
These parameters can be overridden in the following way:
mixcr align -OdParameters.scoring.gapExtensionPenalty=-5 input_file1 [input_file2] output_file.vdjca