Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • marco12345
    Member
    • Mar 2013
    • 19

    sam file integrity

    Hi,

    Is there away to check a sam file integrity? I generated one after mapping with bwa, but have problems on rebuilding a bam file from it.
    In fact, when I use
    $ samtools view -F12 -Sb /input_file.sam > new_bams/output_file.bam
    the bam created is empty (4 kb, coming from a 113 Mb sam).
    I tried many different ways, so I am wondering now if the problem is with the sam file.

    Info:
    -The starting bam file I used are paired-end ones of about 50 Mb and 500000 reads.
    -The creation of fastq file goes well, generating 2 files for the two reads, with the same number of reads of the bam, and size of 53 Mb.
    -The sai file generated from the mapping process with BWA are around 1 Mb.
    -The new sam file is generated through this command:
    $ bwa sampe /ref_genome.fa /bwa_read1.sai /bwa_read2.sai /read1.fastq /read2.fastq > /output.sam

    -The header and first lines of the sam are these:

    @SQ SN:gi|9633069|ref|NC_000898.1| LN:162114
    @SQ SN:gi|224020395|ref|NC_001664.2| LN:159322
    @SQ SN:gi|82503188|ref|NC_007605.1| LN:171823
    @SQ SN:gi|139424470|ref|NC_009334.1| LN:172764
    @PG ID:bwa PN:bwa VN:0.6.1-r104
    SRR360611.2626045 77 * 0 0 * * 0 0 GATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGAC 7:C;7@?9C7A@@C@D@A>/;CCGBBADBBBBBEAEE9GHEEFHGKGKJIJKHIIIJKIFJIIGJCC?JKKLKKKJKHJKKJCKCKKJKKKIGHG>EEDB
    SRR360611.2626045 141 * 0 0 * * 0 0 TGGTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCA 8DDEEGHIGGGGHIIEIIJIHHJJJJJKIJJGDJGKKIFHCCIIJKKKKIJKADJIGJFKI>JIAJGGEDB&:0EB>@7C8B6=;.>FACACB:@E=G@:
    SRR360611.18087377 77 * 0 0 * * 0 0 ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACG <<???@A@ACC?<<C<B??D?GFBDEBAECBEEFAIBEKDJJHJIJHGIJKGIFIJKIF@IIIHCJKIJKKLKKGIJJKKJDJCKIKLIJIHHH?FDDB;
    SRR360611.18087377 141 * 0 0 * * 0 0 GGAAGCTTTCTGTTGGCTCACATTTGGTTTATTGATGTAATGTATTGATGCTTCCCATAACGCCCTAAGTTCACACATCAACTGCAACTCCAAAGCCACC 8ECFGGFFFHIHF<ADIJIGDHHHBIIGFF@C?G@CE6BEGGEB>GDEEGCF>HK<F?<F&653&=2:?5?6/5=0B><:GCAGBEB+GF?D:@BB=B;@
    SRR360611.8887247 77 * 0 0 * * 0 0 ATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACG 5<;:=@A@ACA?=<C>FA:ABBFBDEB<BCBBECAIBHKGJHF?FJKGIJKIGIIJKIGIGIGHAF<IJKKLKKHIIGKKJDKCKKKLKKJHHH?FEDA;



    -I am working on a Mac Terminal (OSX 10.5, or OSX 10.6, or on Linux based cluster)

    Thanks in advance!
    Last edited by marco12345; 03-28-2013, 04:31 AM.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Try this: http://genome.sph.umich.edu/wiki/BamUtil:_validate

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      Well, why don't you start by believing what your data tells you. You are filtering away unmapped reads, and almost nothing comes out at the end, maybe nothing mapped?

      77 = 64+8+4+1. That didn't map.

      141=128 + 8+4+1, also didn't map.

      It looks like you are aligning to herpes virus, but your reads seem to align to human mitochondria.

      Comment

      • marco12345
        Member
        • Mar 2013
        • 19

        #4
        I agree, believing the datas should be the first answer, but unfortunately I am quite sure the problem isn't that.
        I am using full adult human genomes, and aligning to 4 viruses, each of which is present in 98% of adult population. I tried this with almost 10 individuals and had the same result, so unless I found a really weird group of people, I guess I made a mistake somewhere, just can't understand where!

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          I am using full adult human genomes
          Are you 100% sure the mitochrondria are there? Because those reads look like mitochrondria, and they didn't map.

          Have you sat down and spot checked a bunch of reads, to see that they are about what you expect?

          Have you used grep to see if you can find some examples of the reads you are looking for?

          Your .bam is telling you that at most, a tiny % of your reads align to herpes virus. Do you have empirical evidence telling you this is not actually true?

          bwa and samtools didn't crash, and they are not returning errors. The fault is very unlikely to be in your files. Either your libraries are not what you believed them to be, or the alignment was not carried out the way you think it was.

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          18 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...