Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which bam file is what I need to count reads?

    Hi every one,
    After I converted bowtie sam files to the bam files, I have no idea which bam should be used to cound reads, because there are tow bam files--- .bam file and .bam.sorted file. Is there any one who can do me a favor?

    Thanks a lot!!

    Richard

  • #2
    I assume you mean counting via htseq-count or something like that. In that case, use whichever file is name (rather than coordinate) sorted. Bowtie produces name-sorted output (as opposed to tophat, which defaults to coordinate sorting things, though you can disable this behavior).

    Comment


    • #3
      I don't quite understand your question. Could you provide the line of code you used to run bowtie? I use Bowtie2 and I believe when it's finished aligning it provides output stating how many reads did and did not align uniquely or more than once.

      Comment


      • #4
        I'll add that if you need to see how a file was sorted, just
        Code:
        samtools view -H file.bam | grep "@HD"
        and see if it says "unsorted", "queryname", or "coordinate". Practically speaking, "unsorted" is usually sufficient and you likely don't need to actually have "queryname" there (I'm sure some aligner or other actually interleaves paired-reads, but I've never seen it).

        Comment


        • #5
          Thanks Devon!
          Yes, I will count them via htseq-count and bedtool-multcov. According to my understanding your opinion, the bam file is what I need. Then, it means that these two tools can automaticly use the sorted bam file and the indexed bam file internally, right?

          Comment


          • #6
            HTSeq-count doesn't perform random access, so it won't use the index (you can't index a non-coordinate sorted BAM file anyway). I've never used bedtool-multcov, so I don't know what it should be fed as input.

            Comment


            • #7
              Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?

              Comment


              • #8
                I am sorry, Devon!
                I got it. sorting the bam file is for its index.

                Comment


                • #9
                  Originally posted by wmseq View Post
                  Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?
                  the samtools sort does not sort in place, so it generates a new sorted file, use it for HTSeq count, but remember to sort it by name (-n flag) as what HTSeq needs (a name sorted sam file).

                  Comment


                  • #10
                    crazyhottommy,
                    You mean that I need the file of "name_forted.bam" for HTSeq count, not the file of "name.bam" from samtools view command?

                    Comment


                    • #11
                      Sorting is for sorting. If you sort by coordinate, then you can create an index to quickly randomly seek to a given portion of the file. You can also name sort, which is really the ideal input to htseq-count. A name-sorted BAM file can't be indexed (I assume this throws an error). You can also have a simple unsorted file. Normally, those actually work fine for use in htseq-count, you just need mates in a pair to be next to each other.

                      If you have single-end reads, then any BAM file (sorted or not) will work for htseq-count.

                      Comment


                      • #12
                        Devon,
                        After I run the following commands, I got three output files---in fact, two of them (0_1Q_3.sam, and 0_1Q_3_sorted.bam) are folders in which there is a file of 0_1Q_3 and a file of 0_1Q_3_sorted respectively, and a file of 0_1Q_3_sorted.bam.bai. That is why I am not sure which file is what I need.

                        $/home/wenfu/bin/samtools import /media/wenfu/LaCie/my_rnaseq_dat/Amhg45.fa 0_1Q_3.sam 0_1Q_3.bam

                        $/home/wenfu/bin/samtools sort 0_1Q_3.bam 0_1Q_3_sorted

                        $/home/wenfu/bin/samtools index 0_1Q_3_sorted.bam

                        Comment


                        • #13
                          For htseq-count, 0_1Q_3.bam would work and the sorted file wouldn't, since you coordinate sorted it (as I mentioned earlier, if you have single-end reads, they both will work). htseq-count needs mates to be next to each other in a file in order to work, so if you feed it a coordinate-sorted file (e.g., 0_1Q_3_sorted.bam), you'll get a lot of warnings and incorrect counts if you have paired-end reads. BTW, in the future, just do this:

                          Code:
                          samtools view -bS 0_1Q_3.sam | samtools sort - 0_1Q_3.sorted
                          samtools index 0_1Q_3.sorted.bam
                          Just give htseq-count the SAM file and then delete it. There's no need to use the old import command, which is just an alias for the "view" command and probably needs an indexed fasta file.

                          Comment


                          • #14
                            Thank a lot, Devon!!
                            Is the "-" following sort and before 0_1Q_3.sorted necessary?

                            Comment


                            • #15
                              Yes, it means "standard input", which is needed for the pipe to work.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              198 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              220 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              229 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X