Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TiborNagy
    replied
    Trimseq from Emboss can remove the unwanted characters, but be careful! You can not use your annotations that based on the original sequence.

    Leave a comment:


  • kenietz
    replied
    Ok now i got it:
    so genes 681 and 682 have the same reads mapped onto them. So when the read has FLAG field = 272 which is 100010000, is filtered out cos is nor primary alignment(bit number 9).

    well but the regions are all well separated and different genes.

    Btw, do you know how to get the following pipeline work for a region without getting the annoying number of 'n'? I mean to get directly the sequence from that region and not the sequence from the start of the chromosome. I tried using the -r option as well and the -l option but result is all the same. Resulting CNS.FQ starts from the beginning of the chromosome.

    samtools mpileup -r An01:1965037-1965142 -uf A_niger_refgen_fin.fa tophat_out_2/accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    for now i decided to use: samtools pileup -c -m 100000 BAM > pileup, and then parse the pileup. with -m 100000 i kind of disable the filtering by samtools. seems to be working even tho maybe not very correct to do
    Last edited by kenietz; 09-26-2012, 08:38 PM.

    Leave a comment:


  • kenietz
    replied
    Neither did i find any quality filter in cufflinks.
    The -Q option is: min base quality (possibly capped by BAQ) [13], but it seems that is not that the problem. In IGV i see for gene682 which is found in the pileup that some base qualities are 7 but most are above 20.
    Last edited by kenietz; 09-26-2012, 07:01 PM.

    Leave a comment:


  • TiborNagy
    replied
    1. Usually reads were filtered before processing. I do not found any information about cufflinks's quality filter.
    2. samtools pileup has a -Q parameter, where you can set the base quality limit.

    Leave a comment:


  • consensus seq, samtools pileup, transcripts.gtf issues

    Hi guys,
    i have some issues with extracting consensus seq for transcripts from cufflinks from the bam file.

    Progs i used:
    samtools-0.1.16
    tophat-2.0.5
    cufflinks-2.0.2
    samtools-0.1.18

    Firstly, i tried using the mpileup pipeline provided by the samtools 0.1.18 man page. One may say it works. But for example if i use that line:

    samtools mpileup -r An01:1965037-1965142 -uf A_niger_refgen_fin.fa tophat_out_2/accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    It produces output in cns.fq of the following format:

    @An01
    nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
    ...
    ...
    nnnnnnnnnnnnnnnnnnnnnnnMYSEQHERE

    Namely it starts from the beginning of the chromosome. That produces many Ns and my real seq is somewhere in the end in the best case. So if i want to get it i still have to parse the cns.fq file.

    Well then i moved to samtools 0.1.16 pileup -c option. It produces the pileup which i parse for every transcript from cufflinks' transrcipts.gtf file. This was working as i wish until i got a case where i have such a transcript:

    An01 Cufflinks transcript 1964561 1964665 1000 - . gene_id "gene681"; transcript_id "gene681"; FPKM "40.5022201188"; frac "1.000000"; conf_lo "0.000000"; conf_hi "180.806026"; cov "4.015327"; full_read_support "yes";
    An01 Cufflinks exon 1964561 1964665 1000 - . gene_id "gene681"; transcript_id "gene681"; exon_number "1"; FPKM "40.5022201188"; frac "1.000000"; conf_lo "0.000000"; conf_hi "180.806026"; cov "4.015327";

    According to the FPKM it looks good. But when i create the pileup file from the accepted.hits.bam and try to get the transcript i got and error in my script. It turned out that there is no such a region in the pileup file. I then dug around and figured that the reads failed QC even tho they are mapped as i see at IGV. Its a bit confusing.

    Questions:

    1. If the reads fail quality check why are they mapped at all then?
    2. How to switch of the quality check of samtools pileup -c? Should i fiddle with the source code?

    Any thoughts and ideas are welcomed
    Thank you

    PS: Some additional info on the reads from the BAM for gene681(not found in pileup) and gene682 found in pileup. I give them cos it occurred to me they might be the same reads mapped on multiple regions but differently. It is so but still cant figure out what all this mess means

    ---- from genes.fpkm_tracking

    gene681 - - gene681 tRNA-Ser%28GCT%29 - An01:1964560-1964665 - - 40.5022 0 180.806 OK
    gene682 - - gene682 tRNA-Ser%28GCT%29 - An01:1965037-1965142 - - 40.5022 0 180.806 OK
    -----------------------------------------

    ---- from samtools view -F 4 regions bam

    GENE 681

    UHDPH:1295:769 272 An01 1964625 0 13M2D26M * 0 0 TAGTAGCAGGCCTGCCTTAAACCACTCGGCCAAAGTGTC AAAB@?=?6@6=<@8A6>1>:5<<CAB9C9@4@?;7883 AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:13^GC26 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965102 HI:i:1
    UHDPH:1945:2723 272 An01 1964625 0 37M1I4M * 0 0 TAGTAGCAGGCCTGCGCCTTAAACCACTCGGCCAAAGCTGTC 536<6??75:48-+-+(+(4,6;16454.+4(1*9=9<87B; AS:i:-8 XN:i:0 XM:i:0 XO:i:1 XG:i:1 NM:i:1 MD:Z:41 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965102 HI:i:1

    ---------------------------------------------------
    GENE 682

    UHDPH:1295:769 16 An01 1965102 0 13M2D26M * 0 0 TAGTAGCAGGCCTGCCTTAAACCACTCGGCCAAAGTGTC AAAB@?=?6@6=<@8A6>1>:5<<CAB9C9@4@?;7883 AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:13^GC26 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965675 HI:i:2
    UHDPH:1945:2723 272 An01 1965102 0 37M1I4M * 0 0 TAGTAGCAGGCCTGCGCCTTAAACCACTCGGCCAAAGCTGTC 536<6??75:48-+-+(+(4,6;16454.+4(1*9=9<87B; AS:i:-8 XN:i:0 XM:i:0 XO:i:1 XG:i:1 NM:i:1 MD:Z:41 YT:Z:UU XS:A:- NH:i:6 CC:Z:= CP:i:1965675 HI:i:2
    ---------------------------------------------------

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
40 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
205 views
0 likes
Last Post seqadmin  
Working...
X