Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentage of reads aligning to a subject

    I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?

  • #2
    You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
    After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

    Comment


    • #3
      Thanks Michael,

      I have not used Bowtie/ samtools before. How do I start off?

      Comment


      • #4
        1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".

        2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.

        3. Are you comfortable using command line (e.g. linux) applications?

        Comment


        • #5
          1. blast results are in txt format
          2. yes. sequence file is available ( database file?)
          3. am fairly comfortable with command line
          4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic)
          Last edited by kaps; 04-11-2015, 05:05 AM.

          Comment


          • #6
            Originally posted by kaps View Post
            Thanks Michael,

            I have not used Bowtie/ samtools before. How do I start off?
            Hi Kaps,

            you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...

            Cheers,
            Michael

            Comment


            • #7
              Hi, Michael

              Thanks for pointing this to me!

              Comment


              • #8
                Originally posted by Michael.Ante View Post
                You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                Hello Michael, when I try samtools idxstats,
                I am getting a comment as below;

                samtools idxstats lib4seq.sorted.bam
                [bam_idxstats] fail to load the index.

                what could be the problem?

                Comment


                • #9
                  Have you created the index with
                  Code:
                  samtools index lib4seq.sorted.bam
                  ?
                  If yes how does your bam-dile header looks like?
                  Code:
                   samtools view -H lib4seq.sorted.bam

                  Comment


                  • #10
                    I had not created the index,
                    I can now see the statistics in the index file!

                    Thanks

                    Comment


                    • #11
                      Originally posted by Michael.Ante View Post
                      You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                      After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                      In a case where my index file has several sequences for different strains/isolates of the same virus which may be treated as duplicates, how do I restrict bowtie2 to do the alignment once?

                      Comment


                      • #12
                        Originally posted by Michael.Ante View Post
                        You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                        After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                        Hello,

                        After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?

                        Comment


                        • #13
                          If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.


                          1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
                          2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)


                          Use @Michael.Ante's easy suggestion below
                          Last edited by GenoMax; 05-12-2015, 04:34 AM.

                          Comment


                          • #14
                            You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:

                            Code:
                            samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam
                            samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam
                            Samtools view will help a lot; just have a look at some tutorials, for instance Dave's wiki

                            Comment


                            • #15
                              if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?
                              Last edited by kaps; 05-19-2015, 02:39 AM. Reason: error correction

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 06-07-2024, 06:58 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:18 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:04 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-03-2024, 06:55 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X