Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ruping
    Member
    • Jul 2010
    • 11

    Samtools view: fail to open file for reading.

    Hi, all

    Every now and then when I am trying to convert .sam file into .bam file by calling
    Code:
    samtools view -bT hg.fa -o xxx.bam xxx.sam
    , I get this kind of error:

    Code:
    [main_samview] fail to open file for reading.
    I'm pretty sure that the xxx.sam file is readable and in the working directory, and the header is like this:


    Code:
    @HD     VN:1.0  SO:sorted
    @PG     ID:TopHat       VN:1.0.13       CL:/scratch/ngsvin/ruping/CancerGenomics/tophat-1.0.13/bin/tophat -o /scratch/ngsvin/RNA-seq/MPI-NF/mimik_pairend/ --solexa1.3-quals -p 5 -r 46 --mate-std-dev 14 --segment-length 20 -G /scratch/ngsvin/RNA-seq/MPI-NF/Hs.genes.gff /scratch/ngsvin/ruping/CancerGenomics/bowtie-0.12.5/indexes/hg18 s_4_1fq.chopped s_4_2fq.chopped
    Run0009Lane4Tile57x3887y5410Multi0      65      chr1    461     255     36M     =       154912309       154911848       CTAACCCTGGCGGTACCCTCAGCCGGCCCGCCCGCC    GGAEGGGGGFGGFGDGGGGG?FFFFGFGGGFGGGFG    NM:i:1
    Run0009Lane4Tile28x19254y9909Multi0     73      chr1    537     0       36M     *       0       0       ACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCG    CGGDGGGFGGFGGGGGFGGGGGGFGGGGEGGGGGGG    NM:i:1
    Run0009Lane4Tile119x16602y20937Multi0   161     chr1    2792    255     36M     =       3160    403     CTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCT    FEFFFFEFFFFFFFFCFDFFEFAFFFFEFFEDFFED    NM:i:0
    Run0009Lane4Tile48x11762y17580Multi0    147     chr1    3112    255     36M     =       3130    -17     TGCCAGCATAGTGCTCCTGGACCAGCGATACGCCCG    EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    NM:i:2
    Run0009Lane4Tile24x15875y8494Multi0     83      chr1    3113    255     36M     =       3120    -28     GCCAGCATAGTGCTCCTGGACCAGCGATACGCCCGG    3>:.@+,31@56/?50;>CBB0)6@766-67/6@77    NM:i:2

    In contrast, I did successfully convert some other .sam file into .bam file and the header looks exactly the same of the above one. The only difference maybe the file size. The above .sam file is very big (10GB), but however I have sufficient memory to load it (>250GB memory). So, It is quite confusing to me that I always get some error like this, I was trying to understand the C code of sam.C but I couldn't figure out what's the problem, can anyone help me? Thanks a lot!

  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.

    Comment

    • shurjo
      Senior Member
      • Jan 2009
      • 132

      #3
      Try -bT <in.bam> -o <out.sam>

      Comment

      • ruping
        Member
        • Jul 2010
        • 11

        #4
        Originally posted by maubp View Post
        Have you tried taking just the start of this big SAM file (i.e. the header and say the first 20 reads). This should tell you if it is the header that is the problem, rather than the file size.

        That's a good point. I tryed and it works for the chopped small file:

        Code:
        head -100 xxx.sam >test.sam
        samtools view -bT hg.fa test.sam >test.bam
        [sam_header_read2] 25 sequences loaded.

        So that means I can not convert large sam files into bam?

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

          I'd also suggest adding some debug statements to samtools, recompile, and re-test.

          Comment

          • ruping
            Member
            • Jul 2010
            • 11

            #6
            Originally posted by maubp View Post
            So at least you know the header is OK. It could be that there is a corrupt or otherwise problematic read later in the SAM file. Can you break the SAM file into chunks to explore this possibility?

            I'd also suggest adding some debug statements to samtools, recompile, and re-test.
            Good suggestion, I'm doing it.

            Comment

            • adamdeluca
              Member
              • Jul 2010
              • 95

              #7
              Code:
              samtools import hg.fa xxx.sam xxx.bam

              Comment

              • ruping
                Member
                • Jul 2010
                • 11

                #8
                Originally posted by adamdeluca View Post
                Code:
                samtools import hg.fa xxx.sam xxx.bam
                Thanks, but this doesn't work either.

                Comment

                • nilshomer
                  Nils Homer
                  • Nov 2008
                  • 1283

                  #9
                  "samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.

                  Comment

                  • ruping
                    Member
                    • Jul 2010
                    • 11

                    #10
                    Originally posted by nilshomer View Post
                    "samtools view -S" reads in a SAM file, "samtools view" (without the "-S") does not.
                    I have tried with or without -S, all the same.

                    I "headed" different number of lines into a new file and then tested whether it works for the conversion, I found:

                    Code:
                    head -13394305 xxx.sam >head.sam
                    samtools view -bST hg18.fa head.sam -o head.bam
                    [sam_header_read2] 25 sequences loaded.
                    
                    head -13394306 xxx.sam >head.sam
                    samtools view -bST hg18.fa head.sam -o head.bam
                    [main_samview] fail to open file for reading.
                    I checked the line of 13394306, nothing special there.
                    Interestingly, if I look into the differences between the file size:
                    Code:
                    -rw------- 1 ruping xxx 2.0G Aug  4 17:42 head.sam  (for 13394305 lines)
                    -rw------- 1 ruping xxx 2.1G Aug  4 17:43 head.sam  (for 13394306 lines)
                    I think there mightbe a limit of the file size for doing the conversion, either caused by my machine or the samtools. However, the memory of my server is sufficient (>250GB) and there is no problem if I put some other big stuff into the memory.

                    So, what do you think?
                    Last edited by ruping; 08-04-2010, 08:08 AM.

                    Comment

                    • Lee Sam
                      Member
                      • Oct 2008
                      • 57

                      #11
                      I had a similar issue with tview where it couldn't find the .sai index file. Running samtools index [whatever] fixed the issue.

                      Comment

                      • ruping
                        Member
                        • Jul 2010
                        • 11

                        #12
                        I should mention that the version of the samtools I'm using is 0.1.8.

                        There is an interesting thing happened, I tried another version of samtools (0.1.7-6 (r530)), and now it works! But this doesn't give me a scientific explanation...

                        Code:
                        /home/somebody/samtools/samtools view -bST hg18.fa head.sam -o head.bam
                        [sam_header_read2] 25 sequences loaded.

                        Comment

                        • wuhoucdc
                          Member
                          • Oct 2009
                          • 14

                          #13
                          Hi ruping,

                          So that means I can not convert large sam files into bam?[/QUOTE]


                          I think you can convert sam files as large as possible to bam. I have tried a sam file more than 100G.

                          Wu

                          Comment

                          Latest Articles

                          Collapse

                          • GATTACAT
                            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by GATTACAT
                            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                            07-01-2026, 11:43 AM
                          • SEQadmin2
                            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by SEQadmin2


                            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                            Here are nine questions we think about, in roughly the order they matter, before...
                            06-18-2026, 07:11 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Yesterday, 11:08 AM
                          0 responses
                          6 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-30-2026, 05:37 AM
                          0 responses
                          11 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-26-2026, 11:10 AM
                          0 responses
                          19 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-17-2026, 06:09 AM
                          0 responses
                          53 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...