Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wmseq
    Member
    • May 2011
    • 71

    Which bam file is what I need to count reads?

    Hi every one,
    After I converted bowtie sam files to the bam files, I have no idea which bam should be used to cound reads, because there are tow bam files--- .bam file and .bam.sorted file. Is there any one who can do me a favor?

    Thanks a lot!!

    Richard
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    I assume you mean counting via htseq-count or something like that. In that case, use whichever file is name (rather than coordinate) sorted. Bowtie produces name-sorted output (as opposed to tophat, which defaults to coordinate sorting things, though you can disable this behavior).

    Comment

    • Heisman
      Senior Member
      • Dec 2010
      • 534

      #3
      I don't quite understand your question. Could you provide the line of code you used to run bowtie? I use Bowtie2 and I believe when it's finished aligning it provides output stating how many reads did and did not align uniquely or more than once.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        I'll add that if you need to see how a file was sorted, just
        Code:
        samtools view -H file.bam | grep "@HD"
        and see if it says "unsorted", "queryname", or "coordinate". Practically speaking, "unsorted" is usually sufficient and you likely don't need to actually have "queryname" there (I'm sure some aligner or other actually interleaves paired-reads, but I've never seen it).

        Comment

        • wmseq
          Member
          • May 2011
          • 71

          #5
          Thanks Devon!
          Yes, I will count them via htseq-count and bedtool-multcov. According to my understanding your opinion, the bam file is what I need. Then, it means that these two tools can automaticly use the sorted bam file and the indexed bam file internally, right?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            HTSeq-count doesn't perform random access, so it won't use the index (you can't index a non-coordinate sorted BAM file anyway). I've never used bedtool-multcov, so I don't know what it should be fed as input.

            Comment

            • wmseq
              Member
              • May 2011
              • 71

              #7
              Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?

              Comment

              • wmseq
                Member
                • May 2011
                • 71

                #8
                I am sorry, Devon!
                I got it. sorting the bam file is for its index.

                Comment

                • crazyhottommy
                  Senior Member
                  • Apr 2012
                  • 187

                  #9
                  Originally posted by wmseq View Post
                  Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?
                  the samtools sort does not sort in place, so it generates a new sorted file, use it for HTSeq count, but remember to sort it by name (-n flag) as what HTSeq needs (a name sorted sam file).

                  Comment

                  • wmseq
                    Member
                    • May 2011
                    • 71

                    #10
                    crazyhottommy,
                    You mean that I need the file of "name_forted.bam" for HTSeq count, not the file of "name.bam" from samtools view command?

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Sorting is for sorting. If you sort by coordinate, then you can create an index to quickly randomly seek to a given portion of the file. You can also name sort, which is really the ideal input to htseq-count. A name-sorted BAM file can't be indexed (I assume this throws an error). You can also have a simple unsorted file. Normally, those actually work fine for use in htseq-count, you just need mates in a pair to be next to each other.

                      If you have single-end reads, then any BAM file (sorted or not) will work for htseq-count.

                      Comment

                      • wmseq
                        Member
                        • May 2011
                        • 71

                        #12
                        Devon,
                        After I run the following commands, I got three output files---in fact, two of them (0_1Q_3.sam, and 0_1Q_3_sorted.bam) are folders in which there is a file of 0_1Q_3 and a file of 0_1Q_3_sorted respectively, and a file of 0_1Q_3_sorted.bam.bai. That is why I am not sure which file is what I need.

                        $/home/wenfu/bin/samtools import /media/wenfu/LaCie/my_rnaseq_dat/Amhg45.fa 0_1Q_3.sam 0_1Q_3.bam

                        $/home/wenfu/bin/samtools sort 0_1Q_3.bam 0_1Q_3_sorted

                        $/home/wenfu/bin/samtools index 0_1Q_3_sorted.bam

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          For htseq-count, 0_1Q_3.bam would work and the sorted file wouldn't, since you coordinate sorted it (as I mentioned earlier, if you have single-end reads, they both will work). htseq-count needs mates to be next to each other in a file in order to work, so if you feed it a coordinate-sorted file (e.g., 0_1Q_3_sorted.bam), you'll get a lot of warnings and incorrect counts if you have paired-end reads. BTW, in the future, just do this:

                          Code:
                          samtools view -bS 0_1Q_3.sam | samtools sort - 0_1Q_3.sorted
                          samtools index 0_1Q_3.sorted.bam
                          Just give htseq-count the SAM file and then delete it. There's no need to use the old import command, which is just an alias for the "view" command and probably needs an indexed fasta file.

                          Comment

                          • wmseq
                            Member
                            • May 2011
                            • 71

                            #14
                            Thank a lot, Devon!!
                            Is the "-" following sort and before 0_1Q_3.sorted necessary?

                            Comment

                            • dpryan
                              Devon Ryan
                              • Jul 2011
                              • 3478

                              #15
                              Yes, it means "standard input", which is needed for the pipe to work.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 05:03 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 07:27 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              15 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...