Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kenietz
    Member
    • Nov 2011
    • 86

    consensus seq, samtools pileup, transcripts.gtf issues

    Hi guys,
    i have some issues with extracting consensus seq for transcripts from cufflinks from the bam file.

    Progs i used:
    samtools-0.1.16
    tophat-2.0.5
    cufflinks-2.0.2
    samtools-0.1.18

    Firstly, i tried using the mpileup pipeline provided by the samtools 0.1.18 man page. One may say it works. But for example if i use that line:

    samtools mpileup -r An01:1965037-1965142 -uf A_niger_refgen_fin.fa tophat_out_2/accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    It produces output in cns.fq of the following format:

    @An01
    nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
    ...
    ...
    nnnnnnnnnnnnnnnnnnnnnnnMYSEQHERE

    Namely it starts from the beginning of the chromosome. That produces many Ns and my real seq is somewhere in the end in the best case. So if i want to get it i still have to parse the cns.fq file.

    Well then i moved to samtools 0.1.16 pileup -c option. It produces the pileup which i parse for every transcript from cufflinks' transrcipts.gtf file. This was working as i wish until i got a case where i have such a transcript:

    An01 Cufflinks transcript 1964561 1964665 1000 - . gene_id "gene681"; transcript_id "gene681"; FPKM "40.5022201188"; frac "1.000000"; conf_lo "0.000000"; conf_hi "180.806026"; cov "4.015327"; full_read_support "yes";
    An01 Cufflinks exon 1964561 1964665 1000 - . gene_id "gene681"; transcript_id "gene681"; exon_number "1"; FPKM "40.5022201188"; frac "1.000000"; conf_lo "0.000000"; conf_hi "180.806026"; cov "4.015327";

    According to the FPKM it looks good. But when i create the pileup file from the accepted.hits.bam and try to get the transcript i got and error in my script. It turned out that there is no such a region in the pileup file. I then dug around and figured that the reads failed QC even tho they are mapped as i see at IGV. Its a bit confusing.

    Questions:

    1. If the reads fail quality check why are they mapped at all then?
    2. How to switch of the quality check of samtools pileup -c? Should i fiddle with the source code?

    Any thoughts and ideas are welcomed
    Thank you

    PS: Some additional info on the reads from the BAM for gene681(not found in pileup) and gene682 found in pileup. I give them cos it occurred to me they might be the same reads mapped on multiple regions but differently. It is so but still cant figure out what all this mess means

    ---- from genes.fpkm_tracking

    gene681 - - gene681 tRNA-Ser%28GCT%29 - An01:1964560-1964665 - - 40.5022 0 180.806 OK
    gene682 - - gene682 tRNA-Ser%28GCT%29 - An01:1965037-1965142 - - 40.5022 0 180.806 OK
    -----------------------------------------

    ---- from samtools view -F 4 regions bam

    GENE 681

    UHDPH:1295:769 272 An01 1964625 0 13M2D26M * 0 0 TAGTAGCAGGCCTGCCTTAAACCACTCGGCCAAAGTGTC AAAB@?=?6@6=<@8A6>1>:5<<CAB9C9@4@?;7883 AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:13^GC26 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965102 HI:i:1
    UHDPH:1945:2723 272 An01 1964625 0 37M1I4M * 0 0 TAGTAGCAGGCCTGCGCCTTAAACCACTCGGCCAAAGCTGTC 536<6??75:48-+-+(+(4,6;16454.+4(1*9=9<87B; AS:i:-8 XN:i:0 XM:i:0 XO:i:1 XG:i:1 NM:i:1 MD:Z:41 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965102 HI:i:1

    ---------------------------------------------------
    GENE 682

    UHDPH:1295:769 16 An01 1965102 0 13M2D26M * 0 0 TAGTAGCAGGCCTGCCTTAAACCACTCGGCCAAAGTGTC AAAB@?=?6@6=<@8A6>1>:5<<CAB9C9@4@?;7883 AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:13^GC26 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965675 HI:i:2
    UHDPH:1945:2723 272 An01 1965102 0 37M1I4M * 0 0 TAGTAGCAGGCCTGCGCCTTAAACCACTCGGCCAAAGCTGTC 536<6??75:48-+-+(+(4,6;16454.+4(1*9=9<87B; AS:i:-8 XN:i:0 XM:i:0 XO:i:1 XG:i:1 NM:i:1 MD:Z:41 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965675 HI:i:2
    ---------------------------------------------------
  • TiborNagy
    Senior Member
    • Mar 2010
    • 329

    #2
    1. Usually reads were filtered before processing. I do not found any information about cufflinks's quality filter.
    2. samtools pileup has a -Q parameter, where you can set the base quality limit.

    Comment

    • kenietz
      Member
      • Nov 2011
      • 86

      #3
      Neither did i find any quality filter in cufflinks.
      The -Q option is: min base quality (possibly capped by BAQ) [13], but it seems that is not that the problem. In IGV i see for gene682 which is found in the pileup that some base qualities are 7 but most are above 20.
      Last edited by kenietz; 09-26-2012, 07:01 PM.

      Comment

      • kenietz
        Member
        • Nov 2011
        • 86

        #4
        Ok now i got it:
        so genes 681 and 682 have the same reads mapped onto them. So when the read has FLAG field = 272 which is 100010000, is filtered out cos is nor primary alignment(bit number 9).

        well but the regions are all well separated and different genes.

        Btw, do you know how to get the following pipeline work for a region without getting the annoying number of 'n'? I mean to get directly the sequence from that region and not the sequence from the start of the chromosome. I tried using the -r option as well and the -l option but result is all the same. Resulting CNS.FQ starts from the beginning of the chromosome.

        samtools mpileup -r An01:1965037-1965142 -uf A_niger_refgen_fin.fa tophat_out_2/accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

        for now i decided to use: samtools pileup -c -m 100000 BAM > pileup, and then parse the pileup. with -m 100000 i kind of disable the filtering by samtools. seems to be working even tho maybe not very correct to do
        Last edited by kenietz; 09-26-2012, 08:38 PM.

        Comment

        • TiborNagy
          Senior Member
          • Mar 2010
          • 329

          #5
          Trimseq from Emboss can remove the unwanted characters, but be careful! You can not use your annotations that based on the original sequence.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 05:37 AM
          0 responses
          6 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          16 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          51 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          110 views
          0 reactions
          Last Post SEQadmin2  
          Working...