Hi I am a first year phd student trying my hand at bioinformatcs. I assembled about 190,000 454 reads with mira3 and I need some help going through the output.
The assembly info file indicated that I have 21,769 contigs. But the result fasta file shows around 24,000 sequences. I am assuming that the fasta sequences with the header name ending in 'c' (for example, >myProject_c1203) indicates a contig that was assembed. What are sequences that have header names ending in 's' or 'lrc' (>myproject_lrc2938).
Does mira3 discard sequences that it thinks is of low quality or too short? I noticed that the number of reads assembled is 119,794 (2508 singlets), whereas the number of reads I fed into mira3 was around 190,000.
The assembly info file indicated that I have 21,769 contigs. But the result fasta file shows around 24,000 sequences. I am assuming that the fasta sequences with the header name ending in 'c' (for example, >myProject_c1203) indicates a contig that was assembed. What are sequences that have header names ending in 's' or 'lrc' (>myproject_lrc2938).
Does mira3 discard sequences that it thinks is of low quality or too short? I noticed that the number of reads assembled is 119,794 (2508 singlets), whereas the number of reads I fed into mira3 was around 190,000.
Comment