Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • figo1019
    replied
    Originally posted by newbietonextgen View Post
    Hi

    I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

    java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

    I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    The GATK run code is below....

    java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
    INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
    INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
    INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
    INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
    INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
    INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
    INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
    INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
    INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
    ##### ERROR -------------------------------------


    Please help. I could be something very simple

    Hi newbietonextgen

    I am also facing the similar problem.Have you sorted it out?

    Regards

    Leave a comment:


  • newbietonextgen
    replied
    also i have tried other software like Splice map and even the SAM file when converted to BAM doesn't pass through GATK BAM norm. SO strange. Shrimp alignment works fine...why is there so much difference in SAM format?

    Leave a comment:


  • newbietonextgen
    replied
    It's sanger quality (1.9 Illumina pipeline)

    Leave a comment:


  • newbietonextgen
    replied
    samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

    Output. looks like there is * instead of quality score. Now have to check fastq file....

    HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 153 1 134 3 100M * 0 0 GCCTTCAGATCCTTCTCTCCGGACCGTATGCTGACGGACTTCCCTGGCCCTGCTACCTGAGACCTGCTGCTTCCTCCCTGACTTACTCTGCGGCTTCTTC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 CC:Z:= CP:i:34437630 HI:i:0
    HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 393 1 34437630 3 100M * 0 0 GAAGAAGCCGCAGAGTAAGTCAGGGAGGAAGCAGCAGGTCTCAGGTAGCAGGGCCAGGGAAGTCCGTCAGCATACGGTCCGGAGAGAAGGATCTGAAGGC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 HI:i:1

    Leave a comment:


  • newbietonextgen
    replied
    Tophat: 2.0.0.4

    the run command i used..

    ./tophat -p 4 -G /home/sudeep/work/6-20-12/Gallus_gallus.WASHUC2.67.gtf -o /home/sudeep/work/6-20-12/layers/Infected/1_4I /home/sudeep/programs/bowtie2-2.0.0-beta6/index/chicken_order /home/sudeep/work/6-20-12/layers/Infected/1_4I_R1.fastq.gz /home/sudeep/work/6-20-12/layers/Infected/1_4I_R2.fastq.gz

    Leave a comment:


  • jstjohn
    replied
    Also you might want to see if you can find that read in your fastq file and double check that it has quality values there. Sometimes fastq files can become screwed up by various processing steps. Some programs that do mapping and other downstream stuff treat a bad fastq record differently so it could be that one program is dropping that read since it has no quality scores, and the other is including it? I don't know, I am just guessing at possibilities now.

    Leave a comment:


  • jstjohn
    replied
    Also what tophat command did you use to do the mapping? Could always be a good-ol phred+33 vs phred+64 issue.

    Leave a comment:


  • jstjohn
    replied
    Maybe a tophat bug? I think this is the important line of that error:

    12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]

    It is saying that read has no quality values attached. Here is something you could do: run samtools view and grab that specific sequence, then see if indeed the sam line has no quality score information attached, or if it looks weird in some other way. If that is the case then maybe there is some bug with whatever version of tophat you are using?


    Here is one way to get that sequence:

    samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

    then you can look at bad_read.sam and see what's up.

    Leave a comment:


  • newbietonextgen
    replied
    I did find something.

    This problem is only with Tophat based BAM files. I have a SHRIMP based BAM alignment and GATK works like a charm. Can any one shed some information as to why?

    Leave a comment:


  • newbietonextgen
    started a topic GATK BAM error

    GATK BAM error

    Hi

    I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

    java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

    I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    The GATK run code is below....

    java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
    INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
    INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
    INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
    INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
    INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
    INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
    INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
    INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
    INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
    ##### ERROR -------------------------------------


    Please help. I could be something very simple

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Working...
X