Hi,
I am learning bioinformatics and have a basic question about bowtie2.
I got 2 different sets of single-end sequencing reads:
- set #1: 10,000,000 reads
- set #2: 12,000,000 reads
I concatenated these two datasets together (22,000,000 reads) and assembled them with Trinity (100,000 contigs).
Now, I am trying to know which contigs come from which set of data.
For that I aligned with bowtie2 the indexed contigs to the reads of the first set, and then did the same with the second set separately:
- indexed contigs VS set #1
- indexed contigs VS set #2
I used the bowtie2 --end_to_end option and the --al option in order to output the contig sequences that aligned to the reads, and it returned:
- overall alignment rate of contigs VS set #1 = 80% (8,000,000 sequences)
- aligned contig file set #1: 15,000,000 sequences
- overall alignment rate of contigs VS set #2 = 78% (9,360,000 sequences)
- aligned contig file set #2: 19,000,000 sequences
How can I end up with more contigs than Trinity produced and than the number of reads???
I am clearly doing something wrong.
How could I obtain the sequences of the contigs from set #1 and the sequences of the contigs from set #2?
Is the --al the right option?
Should I start digging the SAM file instead?
Thanks in advance for your help!
I am learning bioinformatics and have a basic question about bowtie2.
I got 2 different sets of single-end sequencing reads:
- set #1: 10,000,000 reads
- set #2: 12,000,000 reads
I concatenated these two datasets together (22,000,000 reads) and assembled them with Trinity (100,000 contigs).
Now, I am trying to know which contigs come from which set of data.
For that I aligned with bowtie2 the indexed contigs to the reads of the first set, and then did the same with the second set separately:
- indexed contigs VS set #1
- indexed contigs VS set #2
I used the bowtie2 --end_to_end option and the --al option in order to output the contig sequences that aligned to the reads, and it returned:
- overall alignment rate of contigs VS set #1 = 80% (8,000,000 sequences)
- aligned contig file set #1: 15,000,000 sequences
- overall alignment rate of contigs VS set #2 = 78% (9,360,000 sequences)
- aligned contig file set #2: 19,000,000 sequences
How can I end up with more contigs than Trinity produced and than the number of reads???
I am clearly doing something wrong.
How could I obtain the sequences of the contigs from set #1 and the sequences of the contigs from set #2?
Is the --al the right option?
Should I start digging the SAM file instead?
Thanks in advance for your help!
Comment