I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
-
1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".
2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.
3. Are you comfortable using command line (e.g. linux) applications?
Comment
-
-
Originally posted by kaps View PostThanks Michael,
I have not used Bowtie/ samtools before. How do I start off?
you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...
Cheers,
Michael
Comment
-
Originally posted by Michael.Ante View PostYou can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
I am getting a comment as below;
samtools idxstats lib4seq.sorted.bam
[bam_idxstats] fail to load the index.
what could be the problem?
Comment
-
Originally posted by Michael.Ante View PostYou can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
Comment
-
Originally posted by Michael.Ante View PostYou can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?
Comment
-
If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.
1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)
Use @Michael.Ante's easy suggestion belowLast edited by GenoMax; 05-12-2015, 04:34 AM.
Comment
-
You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:
Code:samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:35 AM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
Today, 07:35 AM
|
||
Started by seqadmin, Yesterday, 02:06 PM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 02:06 PM
|
||
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
47 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
Comment