Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • output of samtools

    hi, according to the Novoalign NGS Quick Start Tutorial, I ued the following command :
    ######################################
    #! /bin/sh
    if [ $# -lt 3 ]; then
    echo [ query.fa ][reference.fa] [head]
    exit
    fi
    /var/data/novocraft/novoindex $2.index $2
    /var/data/novocraft/novoalign -o SAM -f $1 -d $2.index > $3.aln.sam
    /var/data/samtools-0.1.7_x86_64-linux/samtools view -bS $3.aln.sam > $3.aln.bam
    /var/data/samtools-0.1.7_x86_64-linux/samtools sort $3.aln.bam $3.aln.sort
    /var/data/samtools-0.1.7_x86_64-linux/samtools rmdup -s $3.aln.sort.bam read1.rmpup.bam
    /var/data/samtools-0.1.7_x86_64-linux/samtools view -u -q 20 read1.rmpup.bam > read1.rmpup.q20.bam
    /var/data/samtools-0.1.7_x86_64-linux/samtools pileup -vcf $2 read1.rmpup.q20.bam > $3.raw.txt
    perl /var/data/samtools-0.1.7_x86_64-linux/samtools.pl varFilter -D 1000 $3.raw.txt > $3.flt.txt
    awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=30)' $3.flt.txt > $3.final.txt
    ##################################################
    something puzzled me is that read1.rmpup.q20.bam is bigger than read1.rmpup.bam, since -q 20 is a limitation.
    another interesting result is -D in varFilter has great effect on the flt.txt. when i use -D 1000, all of the result in flt.txt are indel. while -D 100, most of result are snp.
    any advise will be great appreciated.

  • #2
    1. You are telling samtools to uncompress the output (view -u). Try removing -u.

    2. What kind of genome are you sequencing and what is the expected sequencing coverage? Not sure how -D 100 -D 1000 can produce that effect.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      1. You are telling samtools to uncompress the output (view -u). Try removing -u.

      2. What kind of genome are you sequencing and what is the expected sequencing coverage? Not sure how -D 100 -D 1000 can produce that effect.
      thank for your replay. my genome is a bacteria having 3.9M genome and 192K plasmid. I estimate the sequencing coverage by N(number of reads)*L(read length)/genome size. samtools manual says set the -D to set a maximum read depth according to the average read depth. and I set -D as average depth. is there anything i understand wrong?

      Comment


      • #4
        How many reads do you have? With the size of your genome you probably have a very deep coverage that maybe causing those artifacts. Try reducing the amount of data you pass to samtools and also use tview to inspect some calls to see what the alignments tell you. Also run pileup (deprecated) and mpileup and compare results.
        -drd

        Comment


        • #5
          Originally posted by drio View Post
          How many reads do you have? With the size of your genome you probably have a very deep coverage that maybe causing those artifacts. Try reducing the amount of data you pass to samtools and also use tview to inspect some calls to see what the alignments tell you. Also run pileup (deprecated) and mpileup and compare results.
          thanks. we have sequenced two bacteria and one has 6,043,147 reads with 36bp SE, another has 29,552,268 reads , 76bp SE in qseq format, and i filter the qseq file, got 18,717,147 reads. I think the depth is high, but i do not know how it can effect the results.
          anther question is how should i reduce reads, random? is it possible bias may cause some question.
          I used mpileup option," samtools mpileup -ugf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
          bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf ". but the results only have several SNPs. may be I should try other parameters in bcftools.

          Comment


          • #6
            I don't see anything wrong on your method. What is the expecting mutation rate on those two bacterias? What makes you think the results are wrong?
            -drd

            Comment


            • #7
              Originally posted by drio View Post
              I don't see anything wrong on your method. What is the expecting mutation rate on those two bacterias? What makes you think the results are wrong?
              since one bacteria is Wild type and the other is mutated from WT, and the phenotype between them are different significant. i think the difference of genomes should not include only several snps.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              23 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X