If anyone has worked with a report (native format) generated using novoalign, please help me with these doubts. The datasets used are Illumina paired reads.
A) Below is the snapshot of a Novoalign report (native format) for Illumina paired reads.
Code:
@0:1:1:34:429 L GAAGNAAAAATAAAAGCATTAGNAGAAATTTGTACA IIII$IIIII&IIIIIIIIIII$IIIIIIIIIIIII U 14 91 >gi|9629357:1-9117 2177 F . 2308 R @0:1:1:34:429 R TNCTTATTAAGCNCTCTGAAATNNANNNNTTTTCTC I$IIIIIIIIII$IIIIIIIII$$'$$$$IIIIIII U 126 91 >gi|9629357:1-9117 2308 R . 2177 F 25A>G 36G>A
1. Aligned Sequence
2. Aligned Offseet
3. Pair Sequence
4. Pair Offset
5. Mismatches
are, in the report.
B ) I was also looking for the aligned reads' start and end positions. Is that information available in this report?
C) At the end of the report are 3 columns given with data
# Fragment Length Distribution
# From To Count
# 27 29 4
# 30 32 30
# 33 35 141
# 36 38 696
# 39 41 1136 ..............etc
Does this mean that from positions 27 to 29, there are 4 reads and so on.
D) Finally, here were the report statistics.
# Paired Reads: 9686877
# Pairs Aligned: 6253455
# Read Sequences: 19373754
# Aligned: 14102273
# Unique Alignment: 14102068
# Gapped Alignment: 875179
# Quality Filter: 248607
#Homopolymer Filter: 1306
I understand that 2 times Paired Reads = Read Sequences. Please help me in understanding why 2 times Pairs Aligned < Aligned. Again if I add Gapped Alignment with Unique Alignment, I do not get Aligned.
Please advice.