Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which bam file is what I need to count reads?

    Hi every one,
    After I converted bowtie sam files to the bam files, I have no idea which bam should be used to cound reads, because there are tow bam files--- .bam file and .bam.sorted file. Is there any one who can do me a favor?

    Thanks a lot!!

    Richard

  • #2
    I assume you mean counting via htseq-count or something like that. In that case, use whichever file is name (rather than coordinate) sorted. Bowtie produces name-sorted output (as opposed to tophat, which defaults to coordinate sorting things, though you can disable this behavior).

    Comment


    • #3
      I don't quite understand your question. Could you provide the line of code you used to run bowtie? I use Bowtie2 and I believe when it's finished aligning it provides output stating how many reads did and did not align uniquely or more than once.

      Comment


      • #4
        I'll add that if you need to see how a file was sorted, just
        Code:
        samtools view -H file.bam | grep "@HD"
        and see if it says "unsorted", "queryname", or "coordinate". Practically speaking, "unsorted" is usually sufficient and you likely don't need to actually have "queryname" there (I'm sure some aligner or other actually interleaves paired-reads, but I've never seen it).

        Comment


        • #5
          Thanks Devon!
          Yes, I will count them via htseq-count and bedtool-multcov. According to my understanding your opinion, the bam file is what I need. Then, it means that these two tools can automaticly use the sorted bam file and the indexed bam file internally, right?

          Comment


          • #6
            HTSeq-count doesn't perform random access, so it won't use the index (you can't index a non-coordinate sorted BAM file anyway). I've never used bedtool-multcov, so I don't know what it should be fed as input.

            Comment


            • #7
              Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?

              Comment


              • #8
                I am sorry, Devon!
                I got it. sorting the bam file is for its index.

                Comment


                • #9
                  Originally posted by wmseq View Post
                  Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?
                  the samtools sort does not sort in place, so it generates a new sorted file, use it for HTSeq count, but remember to sort it by name (-n flag) as what HTSeq needs (a name sorted sam file).

                  Comment


                  • #10
                    crazyhottommy,
                    You mean that I need the file of "name_forted.bam" for HTSeq count, not the file of "name.bam" from samtools view command?

                    Comment


                    • #11
                      Sorting is for sorting. If you sort by coordinate, then you can create an index to quickly randomly seek to a given portion of the file. You can also name sort, which is really the ideal input to htseq-count. A name-sorted BAM file can't be indexed (I assume this throws an error). You can also have a simple unsorted file. Normally, those actually work fine for use in htseq-count, you just need mates in a pair to be next to each other.

                      If you have single-end reads, then any BAM file (sorted or not) will work for htseq-count.

                      Comment


                      • #12
                        Devon,
                        After I run the following commands, I got three output files---in fact, two of them (0_1Q_3.sam, and 0_1Q_3_sorted.bam) are folders in which there is a file of 0_1Q_3 and a file of 0_1Q_3_sorted respectively, and a file of 0_1Q_3_sorted.bam.bai. That is why I am not sure which file is what I need.

                        $/home/wenfu/bin/samtools import /media/wenfu/LaCie/my_rnaseq_dat/Amhg45.fa 0_1Q_3.sam 0_1Q_3.bam

                        $/home/wenfu/bin/samtools sort 0_1Q_3.bam 0_1Q_3_sorted

                        $/home/wenfu/bin/samtools index 0_1Q_3_sorted.bam

                        Comment


                        • #13
                          For htseq-count, 0_1Q_3.bam would work and the sorted file wouldn't, since you coordinate sorted it (as I mentioned earlier, if you have single-end reads, they both will work). htseq-count needs mates to be next to each other in a file in order to work, so if you feed it a coordinate-sorted file (e.g., 0_1Q_3_sorted.bam), you'll get a lot of warnings and incorrect counts if you have paired-end reads. BTW, in the future, just do this:

                          Code:
                          samtools view -bS 0_1Q_3.sam | samtools sort - 0_1Q_3.sorted
                          samtools index 0_1Q_3.sorted.bam
                          Just give htseq-count the SAM file and then delete it. There's no need to use the old import command, which is just an alias for the "view" command and probably needs an indexed fasta file.

                          Comment


                          • #14
                            Thank a lot, Devon!!
                            Is the "-" following sort and before 0_1Q_3.sorted necessary?

                            Comment


                            • #15
                              Yes, it means "standard input", which is needed for the pipe to work.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM
                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin




                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-26-2024, 12:57 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X