Export¶
In order to export result of alignment or clones from binary file
(.vdjca or .clns) to a human-readable text file one can use
exportAlignments and exportClones commands respectively. The
syntax for these commands is:
mixcr exportAlignments [options] alignments.vdjca alignments.txt
mixcr exportClones [options] clones.clns clones.txt
The resulting tab-delimited text file will contain columns with different types of information. If no options specified, the default set of columns, which is sufficient in most cases, will be exported. The possible columns are (see below for details): aligned sequences, qualities, all or just best hit for V, D, J and C genes, corresponding alignemtns, nucleotide and amino acid sequences of gene region present in sequence etc. In case of clones, the additional columns are: clone count, clone fraction etc.
One can customize the list of fields that will be exported by passing
parameters to export commands. For example, in order to export just
clone count, best hits for V and J genes with corresponding alignments
and CDR3 amino acid sequence, one can do:
mixcr exportClones -count -vHit -jHit -vAlignment -jAlignment -aaFeature CDR3 clones.clns clones.txt
The columns in the resulting file will be exported in the exact same
order as parameters in the command line. The list of available fields
will be reviewed in the next subsections. For convenience, MiXCR
provides two predefined sets of fields for exporting: min (will
export minimal required information about clones or alignments) and
full (used by default); one can use these sets by specifying
--preset option:
mixcr exportClones --preset min clones.clns clones.txt
One can add additional columns to preset in the following way:
mixcr exportClones --preset min -qFeature CDR2 clones.clns clones.txt
One can also put all export fields in the file like:
-vHits
-dHits
-feature CDR3
...
and pass this file to export command:
mixcr exportClones --preset-file myFields.txt clones.clns clones.txt
Command line parameters¶
The list of command line parameters for both exportAlignments and
exportClones is the following:
| Option | Description |
|---|---|
-h, --help |
print help message |
-f, --fields |
list available fields that can be exported |
-p, --preset |
select predefined set of fields to export (full or min) |
-pf, --preset-file |
load file with a list of fields to export |
-lf, --list-fields |
list availabel fields that can be exported |
-s, --no-spaces |
output short versions of column headers which facilitates analysis with Pandas, R/DataFrames or other data tables processing library |
The line parameters are only for exportClones:
-c, --chains |
Limit output to specific locus (e.g. TRA or IGH). Clone fractions will be recalculated accordingly. |
-o, --filter-out-of-frames |
Exclude out of frames (fractions will be recalculated) |
-t, --filter-stops |
Exclude sequences containing stop codons (fractions will be recalculated) |
-m, --minimal-clone-count |
Filter clones by minimal read count. |
-q, --minimal-clone-fraction |
Filter clones by minimal clone fraction. |
Available fields¶
The following fields can be exported both for alignments and clones:
| Field | Description |
|---|---|
-vHit |
Best V hit. |
-dHit |
Best D hit. |
-jHit |
Best J hit. |
-cHit |
Best C hit. |
-vHits |
All V hits. |
-dHits |
All D hits. |
-jHits |
All J hits. |
-cHits |
All C hits. |
-vHitsWithScore |
All V hits with scores. |
-dHitsWithScore |
All D hits with scores. |
-jHitsWithScore |
All J hits with scores. |
-cHitsWithScore |
All C hits with scores. |
-vAlignment |
Best V alignment. |
-dAlignment |
Best D alignment. |
-jAlignment |
Best J alignment. |
-cAlignment |
Best C alignment. |
-vAlignments |
All V alignments. |
-dAlignments |
All D alignments. |
-jAlignments |
All J alignments. |
-cAlignments |
All C alignments. |
-nFeature [feature] |
Nucleotide sequence of specified gene feature. |
-qFeature [feature] |
Quality of sequences of specified gene feature. |
-aaFeature [feature] |
Amino acid sequence of specified gene feature. |
-aaFeatureFromLeft [feature] |
Amino acid sequence of specified gene feature (translated from leftmost nucleotide). |
-aaFeatureFromRight [feature] |
Amino acid sequence of specified gene feature. (translated from rightmost nucleotide). |
-avrgFeatureQuality [feature] |
Average quality of sequence of specified gene feature. |
-minFeatureQuality [feature] |
Minimal quality of sequence of specified gene feature. |
-defaultAnchorPoints |
Outputs a list of default anchor points (see table below for the list of anchor points and format). |
-lengthOf [feature] |
Outputs length of specified gene feature. |
-positionOf [anchorPoint] |
Outputs position of specified anchor point in the clonal sequence or aligned read. |
-vBestIdentityPercent |
Alignment identity percent of the best V hit. Percent Identity = (Matches x 100)/Length of aligned region (with gaps) |
-dBestIdentityPercent |
Alignment identity percent of the best D hit. |
-jBestIdentityPercent |
Alignment identity percent of the best J hit. |
-cBestIdentityPercent |
Alignment identity percent of the best C hit. |
-vIdentityPercents |
Alignment identity percents for all V hits. |
-dIdentityPercents |
Alignment identity percents for all D hits. |
-jIdentityPercents |
Alignment identity percents for all J hits. |
-cIdentityPercents |
Alignment identity percents for all C hits. |
-vFamily |
Best V hit family. |
-dFamily |
Best D hit family. |
-jFamily |
Best J hit family. |
-cFamily |
Best C hit family. |
-vFamilies |
All V hit families. |
-dFamilies |
All D hit families. |
-jFamilies |
All J hit families. |
-cFamilies |
All C hit families. |
The following fields are specific for alignments:
| Field | Description |
|---|---|
-sequence |
Aligned sequence (initial read), or 2 sequences in case of paired-end reads. |
-quality |
Initial read quality, or 2 qualities in case of paired-end reads. |
-readId |
Index of source read (in e.g. .fastq file) for alignment. |
-targets |
Number of targets, i.e. 1 in case of single reads and 2 in case of paired-end reads. |
-descrR1 |
Description line from initial .fasta or .fastq file of the first read (only available if
--save-description was used in align command). |
-descrR2 |
Description line from initial .fastq file of the second read (only available if --save-description
was used in align command). |
-cloneId [file] |
Id of clone that aggregated this alignment. The index file must be specified (this file can be built with
--index [file] option for align command). For examples see
this paragraph. |
-cloneIdWithMappinfType
[file] |
Id of clone that aggregated this alignment with additional information about mapping type. The index
file must be specified (this file can be built with --index [file] option for
align command). For examples see this paragraph. |
The following fields are specific for clones:
| Field | Description |
|---|---|
-count |
Clone count. |
-fraction |
Clone fraction. |
-sequence |
Clonal sequence (or several sequences in case of multi-featured assembling). |
-quality |
Clonal sequence quality (or several qualities in case of multi-featured assembling). |
-targets |
Number of targets, i.e. number of gene regions used to assemble clones. |
-readIds [file] |
IDs of reads that were aggregated by clone. The index file must be specified (this
file can be built with --index [file] option for align
command). For examples see this paragraph. |
Default anchor point positions¶
Positions of anchor poins produced by -defaultAnchorPoints option are outputted as a colon separated list.
If anchor point is not covered by target sequence nothing is printed for it, but flanking colon symbols are
preserved to maintain positions in array. See example:
:::::::::108:117:125:152:186:213:243:244:
If there are several target sequences (e.g. paired-end reads or multi-part clonal sequnce), the array is outputted for each target sequence. In this case arrays are sepparated by comma:
2:61:107:107:118:::::::::::::,:::::::::103:112:120:147:181:208:238:239:
Even if there are no anchor points in one of the parts:
:::::::::::::::::,:::::::::108:117:125:152:186:213:243:244:
The following table shows the correspondance between anchor point and positions in default anchor point array:
| Anchors point | Zero-based position | One-based position |
|---|---|---|
| V5UTRBeginTrimmed | 0 | 1 |
| V5UTREnd / L1Begin | 1 | 2 |
| L1End / VIntronBegin | 2 | 3 |
| VIntronEnd / L2Begin | 3 | 4 |
| L2End / FR1Begin | 4 | 5 |
| FR1End / CDR1Begin | 5 | 6 |
| CDR1End / FR2Begin | 6 | 7 |
| FR2End / CDR2Begin | 7 | 8 |
| CDR2End / FR3Begin | 8 | 9 |
| FR3End / CDR3Begin | 9 | 10 |
| VEnd / PSegmentBegin | 10 | 11 |
| VEndTrimmed | 11 | 12 |
| DBeginTrimmed | 12 | 13 |
| DBegin / PSegmentEnd | 13 | 14 |
| DEnd / PSegmentBegin | 14 | 15 |
| DEndTrimmed | 15 | 16 |
| JBeginTrimmed | 16 | 17 |
| JBegin / PSegmentEnd | 17 | 18 |
| CDR3End / FR4Begin | 18 | 19 |
| FR4End | 19 | 20 |
| CBegin | 20 | 21 |
| CExon1End | 21 | 22 |
Positions of anchor points like VEnd are printed only if corresponding P-segment was detected in the sequence, in this case e.g. P-segment of V gene can be found between positions of VEnd and VEndTrimmed.
Examples¶
Export only best V, D, J hits and best V hit alignment from .vdjca
file:
mixcr exportAlignments -vHit -dHit -jHit -vAlignment input.vdjca test.txt
| Best V hit | Best D hit | Best J hit | Best V alignment |
|---|---|---|---|
| IGHV4-34*00 | IGHJ4*00 | |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450|
56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0| |
|
| IGHV2-23*00 | IGHD2*21 | IGHJ6*00 | |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450|
56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0| |
The syntax of alignment is described in appendix.
Exporting well formatted alignments for manual inspection¶
MiXCR allows to export resulting alignments after align
step as a pretty formatted text for manual analysis of produced
alignments and structure of library to facilitate optimization of
analysis parameters and libraray preparation protocol. To export pretty
formatted alignments use exportAlignmentsPretty command:
mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca test.txt
this will export 10 results after skipping first 1000 records and place
result into test.txt file. Skipping of first records is often useful
because first sequences in fastq file may have lower quality then
average reads, so first resulsts are not representative. It is possible
to omit last paramenter with output file name to print result directly
to standard output stream (to console), like this:
mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca
Here is a summary of command line options:
| Option | Description |
|---|---|
-h, --help |
print help message |
-n, --limit |
limit number of alignments; no more than provided number of results will be outputted |
-s, --skip |
number of results to skip |
-t, --top |
output only top hits for V, D, J nad C genes |
--cdr3-contains |
output only those alignemnts which CDR3 contains specified nucleotides (e.g.
--cdr3-contains TTCAGAGGAGC) |
--read-contains |
output only those alignemnts for which corresonding reads contain specified nucleotides
e.g. --read-contains ATGCTTGCGCGCT) |
--verbose |
use more verbose format for alignments (see below for example) |
Results produced by this command has the following structure:
>>> Read id: 1
5'UTR><L1
Quality 88888888888888888888888887888888888888888888888888888888888888888888888887888878
Target0 0 AAGGCCTTTCCACTTGGTGATCAGCACTGAGCACAGAGGACTCACCATGGAGTTGGGGCTGAGCTGGGTTTTCCTTGTTG 79
IGHV3-7*00 54 aaggcctttccacttggtgatcagcactgagcacagaggactcaccatggaAttggggctgagctgggttttccttgttg 133
L1><L2 L2><FR1
Quality 88888888887888888888888888888889989989989889999997999999989999999999999999999899
Target0 80 CTATTTTAGAAGGTGTCCAGTGTGAGGTGAAGTTGGTGGAGTCTGGGGGAGGCCTGGTCCAGCCTGGGGGGTCCCTGAGA 159
IGHV3-7*00 134 ctattttagaaggtgtccagtgtgaggtgCagCtggtggagtctgggggaggcTtggtccagcctggggggtccctgaga 213
FR1><CDR1 CDR1><FR2
Quality 999999999999999999999999999999999999999999999 9999999999999999999999999999999999
Target0 160 CTCTCCTGTGAAGCCTCCGGATTCACCTTTAGTAGTTATTGGATG-GCATGGGTCCGCCAGGGTCCAGGGCAGGGGCTGG 238
IGHV3-7*00 214 ctctcctgtgCagcctcTggattcacctttagtagCtattggatgAgc-tgggtccgccaggCtccagggAaggggctgg 292
FR2><CDR2 CDR2><FR3
Quality 99999999999999999999999999999999999799999999999999999999999999998999899898999999
Target0 239 AATGGGTGGGCAACATAAGGCCGGATGGAAGTGAGAGTTGGTACTTGGAGTCTGTGATGGGGCGATTCATGATATCTAGA 318
IGHV3-7*00 293 aGtgggtggCcaacataaAgcAAgatggaagtgagaAAtACtaTGtggaCtctgtgaAgggCcgattcaCCatCtcCaga 372
FR3><CDR3
Quality 99899899999999988989999889979988888888878878788888888878888888778788888888878888
Target0 319 GACAACGCCAAGAAGTCACTTTATCTGCAAATGGACAGCCTGAGAGTCGAGGACACGGCCGTCTATTATTGTGCGACTTC 398
IGHV3-7*00 373 gacaacgccaagaaCtcactGtatctgcaaatgAacagcctgagagCcgaggacacggcTgtGtattaCtgtgcga 448
IGHD3-10*00 12 ttc 14
CDR3><FR4
Quality 88888788888888888888888787788777887787777877777877787787877878788788777767778788
Target0 399 GGAGGAGCCGGAGGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCGGCTTCCACCAAGGGCCCATCGGTCTTCC 478
IGHD3-10*00 15 gg-ggag 20
IGHJ4*00 8 gactactggggccagggaAccctggtcaccgtctcctc 45
IGHG4*00 0 cttccaccaagggcccatcggtcttcc 26
IGHG3*00 0 cttccaccaagggcccatcggtcttcc 26
IGHG2*00 0 cCtccaccaagggcccatcggtcttcc 26
IGHG1*00 0 cCtccaccaagggcccatcggtcttcc 26
IGHGP*00 194 AgcCtccaccaagggcccatcggtcttcc 222
Quality 87370
Target0 479 CCTTG 483
IGHG4*00 27 ccCtg 31
IGHG3*00 27 ccCtg 31
IGHG2*00 27 ccCtg 31
IGHG1*00 27 ccCtg 31
IGHGP*00 223 ccCtg 227
Using of --verbose option will produce alignments in s slightly different format:
>>> Read id: 12343 <--- Index of analysed read in input file >>> Target sequences (input sequences): Sequence0: <--- Read 1 from paired-end read Contains features: CDR1, VRegionTrimmed, L2, L, Intron, VLIntronL, FR1, Exon1, <--- Gene features VExon2Trimmed found in read 1 0 TCTTGGGGGATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCT 79 <--- Sequyence & quality FGGEGGGGGDG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9 of read 1 80 CTATTAAGAGGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACT 159 F9,A,95AFE,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++< 160 CTCCTGTGCAGCGTCGGGATGCACATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGG 239 <++*++0++2A:ECE5EC5**2@C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))) 240 GGGTGCGTGGTAGATGGGAA 259 )9:.)))*1)12***-/).) Sequence1: <--- Read 2 from paired-end read Contains features: JCDR3Part, DCDR3Part, DJJunction, CDR2, JRegionTrimmed, CDR3, VDJunction, VJJunction, VCDR3Part, ShortCDR3, FR4, FR3 0 CGAGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCG 79 **0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+ 80 ATTCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGT 159 +++<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@ 160 ATTACTGTGCGAGAGGTCAACAGGGTGACTATGTCTACGGTAGGGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCC 239 ,@;FCF@+F@FGGF9FD,F>>+B:=,,=><GFCGGCFEGFF?+=B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFE 240 TCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCTCTGCGTTGATACCACTGGCAGCTC 300 C,FGGGEFCCGEEGGCFCC:8FGEGGGE@DFB-GFGGGGF@GFGFE<,GFCCFCAGC@CCC >>> Gene features that can be extracted from this (paired-)read: <--- For paired-end reads JCDR3Part, CDR1, VRegionTrimmed, L2, DCDR3Part, VDJTranscriptWithout5UTR, Exon2, L, some gene features DJJunction, Intron, FR2, CDR2, VDJRegion, JRegionTrimmed, CDR3, VDJunction, VJJunction, can be extracted by VLIntronL, FR1, VCDR3Part, ShortCDR3, Exon1, FR4, VExon2Trimmed, FR3 merging sequence information >>> Alignments with V gene: IGHV3-33*00 (total score = 1638.0) <--- Alignment of both reads with IGHV3-33 Alignment of Sequence0 (score = 899.0): <--- Alignment of IGHV3-33 with read 1 from paired-end read 65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 <--- Germline ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| |||||| 9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88 <--- Read DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF <--- Quality score 145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224 |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| |||||||||||||||| 89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168 E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++ 225 AGCGTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300 |||||| |||| || | ||| | ||||||||| || |||||| ||||||||| ||||| | ||||||||| ||||| 169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244 2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.) Alignment of Sequence1 (score = 739.0): <--- Alignment of IGHV3-33 with read 2 from paired-end read 279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358 ||||||| |||||| ||||||||| ||||||||||||| ||||||||||||| ||||||||||| ||||||||||||||| 2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81 0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++ 359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTAT 438 |||||||| |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| ||||| 82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161 +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@ 439 TACTGTGCGAGAG 451 ||||||||||||| 162 TACTGTGCGAGAG 174 ;FCF@+F@FGGF9 IGHV3-30*00 (total score = 1582.0) <--- Alternative hit for V gene Alignment of Sequence0 (score = 885.0): 65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| |||||| 9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88 DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF 145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224 |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| |||||||||||||||| 89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168 E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++ 225 AGCCTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300 ||| || |||| || | ||| | ||||||||| || |||||| ||||||||| ||||| | ||||||||| ||||| 169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244 2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.) Alignment of Sequence1 (score = 697.0): 279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358 ||||||| |||||| ||||||||| ||||||| |||| ||||||||||||| ||||||||||| ||||||||||||||| 2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81 0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++ 359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTAT 438 |||||||| |||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||||||||| ||||| 82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161 +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@ 439 TACTGTGCGAGAG 451 ||||||||||||| 162 TACTGTGCGAGAG 174 ;FCF@+F@FGGF9 >>> Alignments with D gene: IGHD4-17*00 (total score = 40.0) Alignment of Sequence1 (score = 40.0): 7 GGTGACTA 14 |||||||| 183 GGTGACTA 190 :=,,=><G IGHD4-23*00 (total score = 36.0) Alignment of Sequence1 (score = 36.0): 0 TGACTACGGT 9 || ||||||| 191 TGTCTACGGT 200 FCGGCFEGFF IGHD2-21*00 (total score = 35.0) Alignment of Sequence1 (score = 35.0): 13 GGTGACT 19 ||||||| 183 GGTGACT 189 :=,,=>< >>> Alignments with J gene: IGHJ6*00 (total score = 172.0) Alignment of Sequence1 (score = 172.0): 22 GGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCA 61 ||||||| ||||| |||||||||||||||||||||||||| 203 GGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCCTCA 242 =B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFEC,F >>> Alignments with C gene: No hits.
Exporting reads aggregated by clones¶
MiXCR allows to preserve mapping between initial reads and final clonotypes. There are several options how to access this information.
In any way, first one need to specify additonal option --index for the assemble command:
mixcr assemble --index index_file alignments.vdjca output.clns
This will tell MiXCR to store mapping in the file index_file (actually two files will be created: index_file and index_file.p both of which are used to store the index; in further options one should specify only index_file without .p extension and MiXCR will automatically read both required files). Now one can use index_file in order to access this information. For example using -cloneId option for exportAlignments command:
mixcr exportAlignments -p min -cloneId index_file alignments.vdjca alignments.txt
will print additional column with id of the clone which contains corresponding alignment:
| Best V hit | Best D hit | ... | CloneId |
|---|---|---|---|
| IGHV4-34*00 | ... | 321 | |
| IGHV2-23*00 | IGHD2*21 | ... | |
| IGHV4-34*00 | IGHD2*21 | ... | 22143 |
| ... | ... | ... | ... |
For more information one can export mapping type as well:
mixcr exportAlignments -p min -cloneIdWithMappingType index_file alignments.vdjca alignments.txt
which will give something like:
| Best V hit | Best D hit | ... | Clone mapping |
|---|---|---|---|
| IGHV4-34*00 | ... | 321:core | |
| IGHV2-23*00 | IGHD2*21 | ... | dropped |
| IGHV4-34*00 | IGHD2*21 | ... | 22143:clustered |
| IGHV4-34*00 | IGHD2*21 | ... | 23:mapped |
| ... | ... | ... | ... |
One can also export all read IDs that were aggregated by eah clone. For this one can use -readIds export options for exportClones action:
mixcr exportClones -c IGH -p min -readIds index_file clones.clns clones.txt
This will add a column with full enumeration of all reads that were absorbed by particular clone:
| Clone ID | Clone count | Best V hit | ... | Reads |
|---|---|---|---|---|
| 0 | 7213 | IGHV4-34*00 | ... | 56,74,92,96,101,119,169,183... |
| 1 | 2951 | IGHV2-23*00 | ... | 46,145,194,226,382,451,464... |
| 2 | 2269 | IGHV4-34*00 | ... | 58,85,90,103,113,116,122,123... |
| 3 | 124 | IGHV4-34*00 | ... | 240,376,496,617,715,783,813... |
| ... | ... | ... | ... |
Note, that resulting txt file may be very huge since all read numbers that were successfully assembled will be printed.
Finally, one can export reads aggregated by each clone into separate .fastq file. For that one need first to specify additional -g option for align command:
mixcr align -g input.fastq alignments.vdjca.gz
With this option MiXCR will store original reads in the .vdjca file. Then one can export reads corresponding for particular clone with exportReadsForClones command. For example, export all reads that were assembled into the first clone (clone with cloneId = 0):
mixcr exportReadsForClones index_file alignments.vdjca.gz 0 reads.fastq.gz
This will create file reads_clns0.fastq.gz (or two files reads_clns0_R1.fastq.gz and reads_clns0_R2.fastq.gz if the original data were paired) with all reads that were aggregated by the first clone. One can export reads for several clones at a time:
mixcr exportReadsForClones index_file alignments.vdjca.gz 0 1 2 33 54 reads.fastq.gz
This will create several files (reads_clns0.fastq.gz, reads_clns1.fastq.gz etc.) for each clone with cloneId equal to 0, 1, 2, 33 and 54 respectively.