Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK - GenomeAnalysisTK - SomaticIndelDetector output

    Hello,

    I managed to run the analysis SomaticIndelDetector from GATK after running trials for several days! I should have the right input files, but I have got an "incomplete" output file:
    ##fileformat=VCFv4.1
    ##FORMAT=<ID=AD,Number=2,Type=Integer,Description="# of reads supporting consensus indel/reference at the site">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total coverage at the site">
    ##FORMAT=<ID=MM,Number=2,Type=Float,Description="Average # of mismatches per consensus indel-supporting read/per reference-supporting read">
    ##FORMAT=<ID=MQS,Number=2,Type=Float,Description="Average mapping qualities of consensus indel-supporting reads/reference-supporting reads">
    ##FORMAT=<ID=NQSBQ,Number=2,Type=Float,Description="Within NQS window: average quality of bases from consensus indel-supporting reads/from reference-supporting reads">
    ##FORMAT=<ID=NQSMM,Number=2,Type=Float,Description="Within NQS window: fraction of mismatching bases in consensus indel-supporting reads/in reference-supporting reads">
    ##FORMAT=<ID=REnd,Number=2,Type=Integer,Description="Median/mad of indel offsets from the ends of the reads">
    ##FORMAT=<ID=RStart,Number=2,Type=Integer,Description="Median/mad of indel offsets from the starts of the reads">
    ##FORMAT=<ID=SC,Number=4,Type=Integer,Description="Strandness: counts of forward-/reverse-aligned indel-supporting reads / forward-/reverse-aligned reference supporting reads">
    ##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic event">
    ##SID_bam_file_used=/data/patient1/picard_s_garma-296_converted_sorted.bam
    ##SID_bam_file_used=/data/patient1/picard_s_garma-fibros_converted_sorted.bam
    ##SomaticIndelDetector="analysis_type=SomaticIndelDetector input_file=[/data/patient1/picard_s_garma-fibros_converted_sorted.bam, /data/patient1/picard_s_garma-296_converted_sorted.bam] read_buffer_size=null phone_home=STANDARD read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL reference_sequence=/home/merlevede/fasta/hg19.fasta rodBind=[] nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false logging_level=INFO log_to_file=null help=false out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub outputFile=null metrics_file=null genotype_intervals=null unpaired=false verboseOutput=indels.txt bedOutput=null minCoverage=10 minNormalCoverage=4 minFraction=0.3 minConsensusFraction=0.7 minIndelCount=0 refseq=null indel_debug=false window_size=200 maxNumberOfReads=10000 filter_mismatching_base_and_quals=false"
    ##contig=<ID=chr1,length=249250621,assembly=hg19>
    ##contig=<ID=chr10,length=135534747,assembly=hg19>
    ##contig=<ID=chr11,length=135006516,assembly=hg19>
    ##contig=<ID=chr12,length=133851895,assembly=hg19>
    ##contig=<ID=chr13,length=115169878,assembly=hg19>
    ##contig=<ID=chr14,length=107349540,assembly=hg19>
    ##contig=<ID=chr15,length=102531392,assembly=hg19>
    ##contig=<ID=chr16,length=90354753,assembly=hg19>
    ##contig=<ID=chr17,length=81195210,assembly=hg19>
    ##contig=<ID=chr18,length=78077248,assembly=hg19>
    ##contig=<ID=chr19,length=59128983,assembly=hg19>
    ##contig=<ID=chr2,length=243199373,assembly=hg19>
    ##contig=<ID=chr20,length=63025520,assembly=hg19>
    ##contig=<ID=chr21,length=48129895,assembly=hg19>
    ##contig=<ID=chr22,length=51304566,assembly=hg19>
    ##contig=<ID=chr3,length=198022430,assembly=hg19>
    ##contig=<ID=chr4,length=191154276,assembly=hg19>
    ##contig=<ID=chr5,length=180915260,assembly=hg19>
    ##contig=<ID=chr6,length=171115067,assembly=hg19>
    ##contig=<ID=chr7,length=159138663,assembly=hg19>
    ##contig=<ID=chr8,length=146364022,assembly=hg19>
    ##contig=<ID=chr9,length=141213431,assembly=hg19>
    ##contig=<ID=chrMt,length=16571,assembly=hg19>
    ##contig=<ID=chrX,length=155270560,assembly=hg19>
    ##contig=<ID=chrY,length=59373566,assembly=hg19>
    ##reference=file:///home/merlevede/fasta/hg19.fasta
    ##reference=hg19.fasta
    ##source=SomaticIndelDetector
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT garma
    chr1 948846 . T TA . . . GT:ADP:MM:MQS:NQSBQ:NQSMM:REnd:RStart:SC 0/1:11,11:18:2.2727273,9.142858:254.0,254.0:29.745455,25.60606:0.0,0.1969697:39,13:36,13:11,0,7,0
    chr1 1276973 . G GACAC . . . GT:ADP:MM:MQS:NQSBQ:NQSMM:REnd:RStart:SC 0/1:16,16:26:0.125,7.0:254.0,254.0:38.78125,36.494625:0.0,0.1827957:43,7:28,7:14,2,9,1
    chr1 1289367 . CTG C . . . GT:ADP:MM:MQS:NQSBQ:NQSMM:REnd:RStart:SC 0/1:13,13:16:0.15384616,2.3333333:254.0,254.0:35.161537,25.958334:0.0,0.29166666:37,15:39,15:12,1,2,1
    I never have the information of the ID, the quality (phred) or filter.
    I have checked the documentation and I'm not supposed to add options to get them...
    Can I get the information about the numbers of reads mapping to the reference and the alt nucleotides for each variant in this file?
    Do you know what could cause the absence of this information?

    Thank you for your help,
    Jane


    Here are the outputs of the run:
    [merlevede@U1009-PCJane GenomeAnalysisTK-1.4-15-gcd43f01]$ /opt/jdk1.7.0_02/bin/java -Xmx10g -jar GenomeAnalysisTK.jar -R ~/fasta/hg19.fasta -T SomaticIndelDetector --minCoverage 10 -o /data/patient1/garma_indels.vcf -verbose indels.txt -I:normal /data/patient1/picard_s_garma-fibros_converted_sorted.bam -I:tumor /data/patient1/picard_s_garma-296_converted_sorted.bam
    INFO 12:57:04,098 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:57:04,100 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-15-gcd43f01, Compiled 2012/01/12 16:14:10
    INFO 12:57:04,100 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 12:57:04,100 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 12:57:04,101 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 12:57:04,101 HelpFormatter - Program Args: -R /home/merlevede/fasta/hg19.fasta -T SomaticIndelDetector --minCoverage 10 -o /data/patient1/garma_indels.vcf -verbose indels.txt -I:normal /data/patient1/picard_s_garma-fibros_converted_sorted.bam -I:tumor /data/patient1/picard_s_garma-296_converted_sorted.bam
    INFO 12:57:04,101 HelpFormatter - Date/Time: 2012/01/20 12:57:04
    INFO 12:57:04,102 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:57:04,102 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:57:04,115 GenomeAnalysisEngine - Strictness is SILENT
    INFO 12:57:04,159 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 12:57:04,183 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO 12:57:04,292 SomaticIndelDetectorWalker - No gene annotations available
    INFO 12:57:11,668 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 12:57:11,668 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining
    INFO 12:57:34,657 TraversalEngine - chr1:19448276 1.73e+06 30.0 s 17.4 s 0.6% 79.6 m 79.1 m
    INFO 12:58:04,660 TraversalEngine - chr1:45997070 4.88e+06 60.0 s 12.3 s 1.5% 67.3 m 66.3 m
    INFO 12:58:34,664 TraversalEngine - chr1:92754531 8.36e+06 90.0 s 10.8 s 3.0% 50.1 m 48.6 m
    INFO 12:59:04,674 TraversalEngine - chr1:151139734 1.19e+07 2.0 m 10.1 s 4.9% 41.0 m 39.0 m
    INFO 12:59:34,681 TraversalEngine - chr1:174979193 1.53e+07 2.5 m 9.8 s 5.7% 44.2 m 41.7 m
    INFO 13:00:04,685 TraversalEngine - chr1:214802387 1.87e+07 3.0 m 9.6 s 6.9% 43.2 m 40.2 m
    INFO 13:00:34,696 TraversalEngine - chr2:1334685 2.16e+07 3.5 m 9.7 s 8.1% 43.2 m 39.7 m
    INFO 13:01:04,708 TraversalEngine - chr2:48035231 2.48e+07 4.0 m 9.7 s 9.6% 41.7 m 37.7 m
    INFO 13:01:34,710 TraversalEngine - chr2:100916302 2.82e+07 4.5 m 9.6 s 11.3% 39.8 m 35.3 m
    INFO 13:02:04,713 TraversalEngine - chr2:160287594 3.14e+07 5.0 m 9.5 s 13.2% 37.8 m 32.8 m
    INFO 13:02:34,726 TraversalEngine - chr2:194754423 3.49e+07 5.5 m 9.5 s 14.3% 38.4 m 32.9 m
    INFO 13:03:04,728 TraversalEngine - chr2:234449507 3.82e+07 6.0 m 9.4 s 15.6% 38.4 m 32.4 m
    INFO 13:03:34,731 TraversalEngine - chr3:38888511 4.11e+07 6.5 m 9.5 s 17.2% 37.9 m 31.4 m
    INFO 13:04:04,740 TraversalEngine - chr3:77366142 4.44e+07 7.0 m 9.5 s 18.4% 38.0 m 31.0 m
    INFO 13:04:34,749 TraversalEngine - chr3:131245414 4.76e+07 7.5 m 9.5 s 20.1% 37.2 m 29.7 m
    INFO 13:05:04,756 TraversalEngine - chr3:183504076 5.08e+07 8.0 m 9.4 s 21.8% 36.6 m 28.6 m
    INFO 13:05:34,772 TraversalEngine - chr4:35024457 5.37e+07 8.5 m 9.5 s 23.4% 36.3 m 27.8 m
    INFO 13:06:04,774 TraversalEngine - chr4:84502632 5.70e+07 9.0 m 9.5 s 25.0% 36.0 m 27.0 m
    INFO 13:06:34,780 TraversalEngine - chr4:144134792 6.02e+07 9.5 m 9.5 s 27.0% 35.2 m 25.7 m
    INFO 13:07:04,788 TraversalEngine - chr5:10288623 6.31e+07 10.0 m 9.5 s 28.8% 34.7 m 24.7 m
    INFO 13:07:34,794 TraversalEngine - chr5:71496034 6.63e+07 10.5 m 9.5 s 30.8% 34.1 m 23.6 m
    INFO 13:08:04,803 TraversalEngine - chr5:130925113 6.95e+07 11.0 m 9.5 s 32.7% 33.6 m 22.6 m
    INFO 13:08:34,809 TraversalEngine - chr5:169567708 7.28e+07 11.5 m 9.5 s 34.0% 33.9 m 22.4 m
    INFO 13:09:04,820 TraversalEngine - chr6:30071922 7.56e+07 12.0 m 9.5 s 35.3% 34.0 m 22.0 m
    INFO 13:09:34,826 TraversalEngine - chr6:64394608 7.90e+07 12.5 m 9.5 s 36.4% 34.3 m 21.8 m
    INFO 13:10:04,834 TraversalEngine - chr6:119324086 8.23e+07 13.0 m 9.5 s 38.2% 34.1 m 21.1 m
    INFO 13:10:34,842 TraversalEngine - chr6:167930950 8.55e+07 13.5 m 9.5 s 39.7% 34.0 m 20.5 m
    INFO 13:11:04,852 TraversalEngine - chr7:47892708 8.83e+07 14.0 m 9.5 s 41.4% 33.8 m 19.8 m
    INFO 13:11:34,855 TraversalEngine - chr7:100555036 9.16e+07 14.5 m 9.5 s 43.1% 33.7 m 19.1 m
    INFO 13:12:04,857 TraversalEngine - chr7:143792436 9.49e+07 15.0 m 9.5 s 44.5% 33.7 m 18.7 m
    INFO 13:12:34,863 TraversalEngine - chr8:30697407 9.76e+07 15.5 m 9.5 s 46.0% 33.7 m 18.2 m
    INFO 13:13:04,870 TraversalEngine - chr8:95423368 1.01e+08 16.0 m 9.5 s 48.1% 33.3 m 17.3 m
    INFO 13:13:34,996 TraversalEngine - chr9:13191 1.04e+08 16.5 m 9.6 s 49.7% 33.2 m 16.7 m
    INFO 13:14:05,001 TraversalEngine - chr9:78601146 1.07e+08 17.0 m 9.6 s 52.3% 32.5 m 15.5 m
    INFO 13:14:35,012 TraversalEngine - chr9:123550151 1.10e+08 17.5 m 9.6 s 53.7% 32.6 m 15.1 m
    INFO 13:15:05,022 TraversalEngine - chr10:16975064 1.13e+08 18.0 m 9.6 s 54.8% 32.8 m 14.8 m
    INFO 13:15:35,030 TraversalEngine - chr10:73105203 1.16e+08 18.5 m 9.6 s 56.6% 32.7 m 14.2 m
    INFO 13:16:05,051 TraversalEngine - chr10:110227756 1.19e+08 19.0 m 9.6 s 57.8% 32.9 m 13.9 m
    INFO 13:16:35,056 TraversalEngine - chr11:8974882 1.22e+08 19.5 m 9.6 s 58.9% 33.1 m 13.6 m
    INFO 13:17:05,064 TraversalEngine - chr11:57182416 1.26e+08 20.0 m 9.6 s 60.5% 33.1 m 13.1 m
    INFO 13:17:35,066 TraversalEngine - chr11:93470323 1.29e+08 20.5 m 9.5 s 61.7% 33.2 m 12.7 m
    INFO 13:18:05,072 TraversalEngine - chr11:134129660 1.32e+08 21.0 m 9.5 s 63.0% 33.3 m 12.3 m
    INFO 13:18:35,080 TraversalEngine - chr12:31288878 1.35e+08 21.5 m 9.5 s 64.0% 33.6 m 12.1 m
    INFO 13:19:05,088 TraversalEngine - chr12:65081974 1.39e+08 22.0 m 9.5 s 65.1% 33.8 m 11.8 m
    INFO 13:19:35,099 TraversalEngine - chr12:110641546 1.42e+08 22.5 m 9.5 s 66.6% 33.8 m 11.3 m
    INFO 13:20:05,101 TraversalEngine - chr13:30177879 1.45e+08 23.0 m 9.5 s 68.3% 33.7 m 10.7 m
    INFO 13:20:35,114 TraversalEngine - chr13:99076753 1.48e+08 23.5 m 9.5 s 70.5% 33.3 m 9.8 m
    INFO 13:21:05,119 TraversalEngine - chr14:45645045 1.51e+08 24.0 m 9.5 s 72.5% 33.1 m 9.1 m
    INFO 13:21:35,123 TraversalEngine - chr14:89087638 1.54e+08 24.5 m 9.5 s 73.9% 33.1 m 8.6 m
    INFO 13:22:05,126 TraversalEngine - chr15:40031853 1.57e+08 25.0 m 9.5 s 75.8% 33.0 m 8.0 m
    INFO 13:22:35,135 TraversalEngine - chr15:65982877 1.61e+08 25.5 m 9.5 s 76.7% 33.3 m 7.8 m
    INFO 13:23:05,136 TraversalEngine - chr16:1884297 1.64e+08 26.0 m 9.5 s 77.9% 33.4 m 7.4 m
    INFO 13:23:35,139 TraversalEngine - chr16:50384007 1.67e+08 26.5 m 9.5 s 79.5% 33.4 m 6.8 m
    INFO 13:24:05,156 TraversalEngine - chr17:1105782 1.70e+08 27.0 m 9.5 s 80.8% 33.4 m 6.4 m
    INFO 13:24:35,164 TraversalEngine - chr17:28885240 1.74e+08 27.5 m 9.5 s 81.7% 33.7 m 6.2 m
    INFO 13:25:05,168 TraversalEngine - chr17:56232579 1.77e+08 28.0 m 9.5 s 82.6% 33.9 m 5.9 m
    INFO 13:25:35,172 TraversalEngine - chr18:5478271 1.80e+08 28.5 m 9.5 s 83.6% 34.1 m 5.6 m
    INFO 13:26:05,187 TraversalEngine - chr18:63637819 1.83e+08 29.0 m 9.5 s 85.4% 34.0 m 4.9 m
    INFO 13:26:35,199 TraversalEngine - chr19:21326354 1.86e+08 29.5 m 9.5 s 86.6% 34.1 m 4.6 m
    INFO 13:27:05,204 TraversalEngine - chr19:53336824 1.90e+08 30.0 m 9.5 s 87.6% 34.2 m 4.2 m
    INFO 13:27:35,205 TraversalEngine - chr20:31424446 1.93e+08 30.5 m 9.5 s 88.8% 34.3 m 3.8 m
    INFO 13:28:05,210 TraversalEngine - chr21:27039041 1.96e+08 31.0 m 9.5 s 90.7% 34.2 m 3.2 m
    INFO 13:28:35,216 TraversalEngine - chr22:30735397 1.99e+08 31.5 m 9.5 s 92.4% 34.1 m 2.6 m
    INFO 13:29:05,219 TraversalEngine - chrX:36205009 2.02e+08 32.0 m 9.5 s 94.2% 34.0 m 117.5 s
    INFO 13:29:35,220 TraversalEngine - chrX:134993596 2.05e+08 32.5 m 9.5 s 97.4% 33.4 m 51.5 s
    INFO 13:30:29,678 TraversalEngine - chrMt:14137 2.06e+08 33.4 m 9.7 s 100.0% 33.4 m 0.0 s
    INFO 13:31:12,821 TraversalEngine - chrMt:15767 2.06e+08 34.1 m 9.9 s 100.0% 34.1 m 0.0 s
    WARNING: Reads aligned past contig length on chrMt; all such reads will be skipped
    INFO 13:31:12,825 Walker - [REDUCE RESULT] Traversal result is: 206276647
    INFO 13:31:12,825 TraversalEngine - Total runtime 2048.18 secs, 34.14 min, 0.57 hours
    INFO 13:31:12,870 TraversalEngine - 15004461 reads were filtered out during traversal out of 221384221 total (6.78%)
    INFO 13:31:12,871 TraversalEngine - -> 15004461 reads (6.78% of total) failing MappingQualityZeroFilter
    INFO 13:31:14,461 GATKRunReport - Uploaded run statistics report to AWS S3

  • #2
    I got this message:
    SomaticIndelDetector - No gene annotations available

    Any Idea ?

    Comment


    • #3
      Originally posted by jp. View Post
      I got this message:
      SomaticIndelDetector - No gene annotations available

      Any Idea ?
      I have this too but I don't think it affects the run.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X