Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filter BAM/SAM based on read length

    Dear All,

    I have a BAM file that contains reads with different length.
    Could you please tell me how can I filter only those reads that are longer/shorter than a given value and write it in a new BAM file?

    I have checked existing forum topics, but I did not find any that could help me.
    I also tried NGSutils, but it just does not work ( installation failed, the commands/programs i.e. bamutils are "not found").

    Unfortunately I have no programming/bioinfo background. I'm just a biologist who have to do everything , So if it is possible please be a bit more detailed.

    Thank you in advance,

  • #2
    You can do that with Reformat:

    reformat.sh in=x.sam out=y.sam minlength=50 maxlength=200

    It works with bam as well if samtools is installed and in the path. Reformat is in the BBMap package. To install that, you just unzip it and untar it.

    Comment


    • #3
      Hi,

      Thank you for the quick response.
      I downloaded the BBMap tar from here: http://sourceforge.net/projects/bbma...e=typ_redirect

      I used tar -xzf BBMap.tar.gz command.

      Do I have to install it somehow? Because when i try to run it the way you wrote is not working. (I have samtools and Java installed )

      I tried this command ( in the directory where the program was extracted):
      sh /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
      then it gave me the following error:
      /home/user/Downloads/bbmap/reformat.sh: 4: /home/user/Downloads/bbmap/reformat.sh: Syntax error: "(" unexpected

      Do I have to put these programs to PATH? Or how exacty do I have to run them?

      Sorry for the trouble,

      Comment


      • #4
        No install is required for BBMap tools. Brian uses bash shell by default.

        Can you try:

        Code:
        $ /home/user/Downloads/bbmap/reformat.sh in=/home/user/Desktop/allbams/file.bam out=/home/user/Desktop/fileout.bam minlength=1 maxlength=41
        If you use "sh" (this is ubuntu?) then it will use dash.

        Comment


        • #5
          Thank you for the response GenoMax.
          The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.

          Comment


          • #6
            Ah yes. You have to convert the BAM file to SAM. I only focused on the error you had seen.

            Comment


            • #7
              Originally posted by krapulaxdoctor View Post
              Thank you for the response GenoMax.
              The code helped. However, I have samtools installed but the program gave an error message with my BAM files. Now i try to convert them to SAM to see if it works.
              Can you post the error message? If samtools is installed and in the path, it should work fine.

              Comment


              • #8
                Yes, Brian, here it is:

                java -ea -Xmx200m -cp /home/user/Downloads/bbmap/current/ jgi.ReformatReads in=/home/user/Desktop/allbams/RNA.bam out=/home/user/Desktop/RNAout.bam minlength=1 maxlength=41
                Executing jgi.ReformatReads [in=/home/user/Desktop/allbams/RNA.bam, out=/home/user/Desktop/RNA.bam, minlength=1, maxlength=41]

                Found samtools.
                Input is being processed as unpaired
                [E::sam_parse1] missing SAM header
                [W::sam_read1] parse error at line 3
                [main_samview] truncated file.
                Exception in thread "Thread-4" java.lang.RuntimeException: java.io.IOException: Broken pipe
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:31)
                Caused by: java.io.IOException: Broken pipe
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:345)
                at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
                at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:533)
                at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:86)
                at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:41)
                at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:27)
                Exception in thread "main" java.lang.RuntimeException: Writing to a terminated thread.
                at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:198)
                at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:193)
                at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97)
                at jgi.ReformatReads.process(ReformatReads.java:892)
                at jgi.ReformatReads.main(ReformatReads.java:46)

                Hope it helps.

                Comment


                • #9
                  Those errors indicate that the bam file is broken. Most of that text comes from Reformat, but these 3 lines:
                  Code:
                  [E::sam_parse1] missing SAM header
                  [W::sam_read1] parse error at line 3
                  [main_samview] truncated file.
                  ...come from samtools. Are you able to convert it to a sam successfully?

                  Comment


                  • #10
                    Yes,
                    samtools can convert them into SAM with the:
                    $ samtools view -h -o out.sam in.bam
                    BBmap can deal with these SAM files.

                    Comment


                    • #11
                      Actually, you were right. There must be a problem with the header. It can be converted to SAM but I cant convert it back to BAM. Also, I cant convert it into BED. Do you know any way to fix the headers? ( unfortunately I have only these files with the crappy header ) Or to generate new header for the SAM files from scratch?
                      Sorry if I ask too many questions.

                      Comment


                      • #12
                        Can you post the first few lines of your sam file?

                        Code:
                        $ samtools view -h your.bam | more

                        Comment


                        • #13
                          @SQ SN:chr1 LN:195471971
                          @SQ SN:chr2 LN:182113224
                          @SQ SN:chr3 LN:160039680
                          @SQ SN:chr4 LN:156508116
                          @SQ SN:chr5 LN:151834684
                          @SQ SN:chr6 LN:149736546
                          @SQ SN:chr7 LN:145441459
                          @SQ SN:chr8 LN:129401213
                          @SQ SN:chr9 LN:124595110
                          @SQ SN:chr10 LN:130694993
                          @SQ SN:chr11 LN:122082543
                          @SQ SN:chr12 LN:120129022
                          @SQ SN:chr13 LN:120421639
                          @SQ SN:chr14 LN:124902244
                          @SQ SN:chr15 LN:104043685
                          @SQ SN:chr16 LN:98207768
                          @SQ SN:chr17 LN:94987271
                          @SQ SN:chr18 LN:90702639
                          @SQ SN:chr19 LN:61431566
                          @SQ SN:chrM LN:16299
                          @SQ SN:chrX LN:171031299
                          @SQ SN:chrY LN:91744698
                          HISEQLN:122:HCW3JADXX:2:2109:6425:52017 272 chr1 3001923 0
                          43M * 0 0 TCTGATTATTATGTGTCAGGAGGAATTTCTTTTCTGGTCCAAT
                          JJJJJJJJIIIJJJJJJJJJJJJJJJJIJJHHHHHFFFFFCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0
                          XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:= CP:i:48081495 XS:A:- HI:i:0

                          Comment


                          • #14
                            Well, it's missing the HD line. Try adding this to the beginning of one of the sam files:
                            "@HD VN:1.3 SO:unsorted"

                            However, that's not required, so I'm not sure what the problem is. It might help if you encapsulate what you pasted with [ code ] [ / code ] (without the spaces) to make the tabs come through:
                            Code:
                            hello	world
                            vs
                            Code:
                            hello world

                            Comment


                            • #15
                              And also, when I try to convert these BAM files into BED files, it generates a file, but the program (bedtools bamtobed)gives this error message:

                              terminate called after throwing an instance of 'std :: out_of_range'
                              what(): vector::_M_range_check
                              Aborted (core dumped)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:58 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:18 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:04 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-03-2024, 06:55 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X