Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Oh, wait, is says "truncated" so presumably the problem is at the end of the file. Can you run "tail" on the file and post the last two lines?

    Comment


    • #17
      Originally posted by Brian Bushnell View Post
      Oh, wait, is says "truncated" so presumably the problem is at the end of the file. Can you run "tail" on the file and post the last two lines?
      How do I do this " tail " ?
      Sorry im a beginner...

      Comment


      • #18
        "tail file.sam"

        That will print the last 10 lines to the console.

        Comment


        • #19
          HISEQHI:525:HCYWJADXX:2:2213:8924:55099 256 * 942639 0 43M * 0 0 CAAAGGGCTGAGAAGCACTTGAAAAAATGTTCAACATCCTTAA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJIIJJJJJJJJJJJJ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:chrX CP:i:128687718 XS:A:+ HI:i:17
          HISEQLN:122:HCW3JADXX:2:2207:7052:25724 272 * 944767 0 43M * 0 0 TACTTACATATAATAAATAAATAAATAAATATTTTTTAAAAAA IFIIGJIJIIIGGIJIJIGFFCIHGIGIIHDHFFHFFDDF@@@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:11 CC:Z:chr6 CP:i:52981629 XS:A:- HI:i:9
          HISEQLN:121:HCYV3ADXX:1:1203:18633:64996 0 * 949324 043M * 0 0 CAGAACCCCTGAAATTGGCAAGATAGACGTCAGTGTTAGCAGA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5G37 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:6419658 XS:A:+ HI:i:12
          HISEQLN:122:HCW3JADXX:1:1112:13385:80114 272 * 949722 043M * 0 0 GGTGTCCGCTAGTGTCCTGAGGCCTGAGCGAGGGGCTCCTCTC ##A7'?DFD;BD:3GGDDDIHG@EFFEFADB?<7DD::@=1 AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:11T31 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:71166409 XS:A:- HI:i:15
          This is the last few lines...

          Comment


          • #20
            Assuming all of the things that look like spaces are actually tabs (sorry, tabs often get replaced by spaces on the console), I don't see anything wrong with the sam file and I don't know what the problem is. It may have something to do with a negative number being detected where a positive number is expected, but I'm just speculating.

            You could try Picard rather than Samtools, and see if you have better luck. Or, try the most recent version of Samtools, or else v0.1.19. Sometimes there's a problem with a specific version.

            Comment


            • #21
              OK , I'll have a try. Thank you for all your help.

              Comment


              • #22
                What version of samtools are you using?

                Comment


                • #23
                  Hi,

                  I am using:
                  Version: 1.2 (using htslib 1.2.1)

                  Comment


                  • #24
                    Hi,

                    Sorry to revive this thread, but I have a similar desire to filter based on length and was excited to learn about reformat!

                    I've run into some issue, but I'm pretty dumb so I'm sure I've just confused something simple.

                    I've downloaded bbmap and have tried to get reformat to work but I'm not having any luck.

                    When I try the following:

                    sh ~/tools/bbmap/reformat.sh in=input.bam out=output.bam minlength=1 maxlength=100

                    I get the following error message:

                    Found samtools.
                    Input is being processed as unpaired
                    [samopen] SAM header is present: 84 sequences.
                    java.lang.AssertionError
                    at stream.SamLine.toShortMatch(SamLine.java:1257)
                    at stream.SamLine.toRead(SamLine.java:1879)
                    at stream.SamLine.toRead(SamLine.java:1749)
                    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:119)
                    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:90)
                    at stream.SamReadInputStream.nextList(SamReadInputStream.java:74)
                    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
                    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
                    Input: 110600 reads 16384426 bases
                    Short Read Discards: 110034 reads (99.49%) 16340390 bases (99.73%)
                    Output: 566 reads (0.51%) 44036 bases (0.27%)

                    Time: 1.287 seconds.
                    Reads Processed: 110k 85.94k reads/sec
                    Bases Processed: 16384k 12.73m bases/sec
                    Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
                    at jgi.ReformatReads.process(ReformatReads.java:1098)
                    at jgi.ReformatReads.main(ReformatReads.java:43)


                    I'm still really excited by the potential of reformat, any advice would be greatly appreciated.

                    Comment


                    • #25
                      Do you still get an error if you remove the minlength=1 directive?

                      Comment


                      • #26
                        Wow! Thanks for the quick reply GenoMax!

                        Sadly that doesn't alleviate my issue:

                        Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
                        at jgi.ReformatReads.process(ReformatReads.java:1098)
                        at jgi.ReformatReads.main(ReformatReads.java:43)

                        Comment


                        • #27
                          It appears that there was some problem processing the line's MD tag. In this case, since you are just filtering based on length, that should not matter and you can just add the flag "-da" to ignore the error, which does not affect the output in this case. I added code to print out the problematic line when that happens in the future. If it's a very small bam file you could email it to me so I can see what the problem is.

                          Comment


                          • #28
                            Brian,

                            Would it be possible to use reformat.sh to filter on the fragment length rather than the read length? I'm looking for a way to split paired-end ATAC-Seq .sam files into "nucleosome-free" and "nucleosome-bound" regions based on size of the fragment, and the proposed solutions I've found elsewhere have been a dead end. Thanks!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Best Practices for Single-Cell Sequencing Analysis
                              by seqadmin



                              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                              06-06-2024, 07:15 AM
                            • seqadmin
                              Latest Developments in Precision Medicine
                              by seqadmin



                              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                              Somatic Genomics
                              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                              05-24-2024, 01:16 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:58 AM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-06-2024, 08:18 AM
                            0 responses
                            20 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-06-2024, 08:04 AM
                            0 responses
                            18 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-03-2024, 06:55 AM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X