Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • prs321
    replied
    Thank you, I will try it out. Appreciate it.

    Leave a comment:


  • lethalfang
    replied
    You may have mostly positions that are only called in one single replicates.

    Try this as a test, just to extract only those positions that exist in all 5 replicates, to see if things are working as intended.

    Code:
    java -Xmx2g -jar GenomeAnalysisTK.jar \
       -R ref.fasta \
       -T CombineVariants \
       -minN 5 \
       --variant rep1.vcf \
       --variant rep2.vcf \
       --variant rep3.vcf \
       --variant rep4.vcf \
       --variant rep5.vcf \
       -o combined.output.vcf

    Leave a comment:


  • prs321
    replied
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT cs1 cs2 cs3 cs4 cs5
    NODE_4_length_282_cov_38.510639 66 . G A 169.04 . AC=1;AF=0.500;AN=2;CIGAR=1X;DP=66;DPRA=0;GTI=0;LEN=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO 0/1:9:19:100,0
    ,100:271:306:10 . . . .
    NODE_4_length_282_cov_38.510639 99 . C T 214.21 . AC=1;AF=0.500;AN=2;CIGAR=1X;DP=92;DPRA=0;GTI=0;LEN=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO 0/1:12:24:100,
    0,100:319:364:11 . . . .
    NODE_4_length_282_cov_38.510639 114 . A G 346.87 . AC=1;AF=0.500;AN=2;CIGAR=1X;DP=102;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO 0/1:14
    :26:100,0,100:475:413:12 . . . .
    NODE_4_length_282_cov_38.510639 154 . G A 452.29 . AC=1;AF=0.500;AN=2;CIGAR=1X;DP=123;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO 0/1:18
    :31:100,0,100:578:467:13 . . . .
    NODE_4_length_282_cov_38.510639 247 . T G 355.03 . AC=1;AF=0.500;AN=2;CIGAR=1X;DP=131;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO
    0/1:14:32:100,0,100:484:576:18 . . . .
    NODE_9_length_16257_cov_15.910501 5514 . C A 248.89 . AB=0;ABP=0;AC=2;AF=1.00;AN=2;CIGAR=1X;DP=68;DPRA=0;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=0;NS=1;NUMALT=1;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;RUN=1;SRF=0;SR
    P=0;SRR=0;TYPE=snp;set=Intersection GT:AOP:PL:QA:QR:RO 1/1:11:11:100,33,0:341:0:0 . . . .
    NODE_9_length_16257_cov_15.910501 9753 . T G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=45;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;SAR=0;TYPE=snp;set=cs1-cs4
    GT:AOP:PL:QA:QR:RO 0/0:6:28:7,0,100:39:587:22 . . . .
    NODE_9_length_16257_cov_15.910501 12869 . T G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=85;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;set=cs1-cs2-cs3 GT:AO:
    DP:PL:QA:QR:RO 0/0:8:28:34,0,100:59:485:20 . . . .
    NODE_9_length_16257_cov_15.910501 14192 . T G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=37;DPRA=0;GTI=0;LEN=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;SAR=0;TYPE=snp;set=cs2-cs4 GT:AOP:PL:QA
    :QR:RO . 0/0:4:19:1,0,100:24:385:15 . . .
    NODE_9_length_16257_cov_15.910501 14201 . A G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=33;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;SAR=0;TYPE=snp;set=cs2-cs3
    GT:AOP:PL:QA:QR:RO . 0/0:4:17:5,0,100:24:385:13 . . .
    NODE_9_length_16257_cov_15.910501 14240 . T G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;AO=4;CIGAR=1X;DP=36;DPB=18;DPRA=0;EPP=11.6962;EPPR=3.63072;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;
    RO=14;RPP=11.6962;RUN=1;SAF=4;SAP=11.6962;SAR=0;TYPE=snp;set=cs2-cs4 GT:AOP:PL:QA:QR:RO . 0/0:4:18:3,0,100:24:383:14 . . .
    NODE_9_length_16257_cov_15.910501 14261 . T G 0 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=36;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=44;RUN=1;SAR=1;TYPE=snp;set=cs1-cs
    3 GT:AOP:PL:QA:QR:RO 0/0:6:21:24,0,100:44:299:15 . . . .
    NODE_9_length_16257_cov_15.910501 14263 . A G 0.01 . AB=0;ABP=0;AC=0;AF=0.00;AN=2;CIGAR=1X;DP=27;DPRA=0;GTI=0;LEN=1;MEANALT=1;MQM=70;MQMR=70;NS=1;NUMALT=1;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;SAF=1;TYPE=snp;set=cs3-cs4
    GT:AOP:PL:QA:QR:RO . . 0/0:5:15:22,0,61:35:78:10 . .

    Leave a comment:


  • lethalfang
    replied
    Originally posted by prs321 View Post
    There are actually 5 columns, but only the first column is listed with the genotype. The other 4 have a '.' under them.
    Can you show me a few rows of the output?

    Leave a comment:


  • prs321
    replied
    There are actually 5 columns, but only the first column is listed with the genotype. The other 4 have a '.' under them.

    Leave a comment:


  • lethalfang
    replied
    There is only one such column? If you combine 5 samples (replicates), there should be 5 such columns.
    If not, make sure the header for the sample info are different in each of the 5 vcf files. Otherwise GATK thinks it's the same sample (i.e., same replicate in this case).
    0/0 means reference calls in both copies.
    Last edited by lethalfang; 05-22-2014, 07:33 AM.

    Leave a comment:


  • prs321
    replied
    0/0:6:21:24,0,100:44:299:15 . . . .

    Leave a comment:


  • lethalfang
    replied
    Originally posted by prs321 View Post
    Yes so for example I have 5 alignment files: rep1.bam, rep2.bam, rep3.bam, rep4.bam, rep5.bam

    All of them have been processed (markduplicates, addreadgroups etc).

    Then I called snps on each alignment to produce the following: rep1.vcf, rep2.vcf, rep3.vcf, rep4.vcf, rep5.vcf

    What will this give you?

    Code:
    java -Xmx2g -jar GenomeAnalysisTK.jar \
       -R ref.fasta \
       -T CombineVariants \
       -minN 2 \
       --variant rep1.vcf \
       --variant rep2.vcf \
       --variant rep3.vcf \
       --variant rep4.vcf \
       --variant rep5.vcf \
       -o combined.output.vcf

    Leave a comment:


  • prs321
    replied
    Yes so for example I have 5 alignment files: rep1.bam, rep2.bam, rep3.bam, rep4.bam, rep5.bam

    All of them have been processed (markduplicates, addreadgroups etc).

    Then I called snps on each alignment to produce the following: rep1.vcf, rep2.vcf, rep3.vcf, rep4.vcf, rep5.vcf

    Leave a comment:


  • lethalfang
    replied
    Originally posted by prs321 View Post
    The issue seems to be that when I combine all 5 vcf files for the replicates, it shows a replicate at 1 site, but shows a '.' for the other 4.

    I'm not really sure how to go about this.
    Does that mean only 1 replicate has that called variant?

    Leave a comment:


  • prs321
    replied
    The issue seems to be that when I combine all 5 vcf files for the replicates, it shows a replicate at 1 site, but shows a '.' for the other 4.

    I'm not really sure how to go about this.

    Leave a comment:


  • lethalfang
    replied


    If you want to only show variants in 2+ duplicates, use "--minimumN 2" option.

    Leave a comment:


  • prs321
    started a topic Combining VCF files using GATK?

    Combining VCF files using GATK?

    Here is my task. I have 4 treatment groups containing 5 replicates in each group.

    For each treatment group, I am looking for at least 2 replicates that have the same base substitution.

    So far I have generated a vcf file for each replicate in each treatment group.

    1. What is the best way to combined these files so that I have 4 vcf files, 1 for each treatment group that contains 5 replicates?

    2. What is the best way to keep track of the position of sites where at least 2 replicates showed the same base substitution so that I can produce results once the gene annotation is complete?

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    12-02-2024, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
137 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:06 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 08:03 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X