Header Leaderboard Ad
Collapse
PCR duplicates, Rna-Seq and GATK
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
-
I do have a set of data containing both Exome Seq and RNA Seq. It will be interesting to see how they correlate in terms of calling. Specifically, it will be interested to see how mark duplication will affect the concordance between the RNA samples from the DNA samples. However, I do have a question: is it valid to use tools like GATK to perform snp calling on RNA Samples? To my knowledge, it seems like GATK might contain certain prior specifically designed for exome sequencing (or whole genome sequencing). Wouldn't that also affect the call concordance in the final data?
-
Good thread, this has got me thinking. Just wondering where you get this from:
Originally posted by kmcarr View Post...They [PCR duplicate removal software] have the built in assumption that any duplicate found is the result of PCR duplication. This is a reasonable assumption if your reads are from a genomic DNA library. It is NOT a valid assumption for RNA-Seq data. For RNA-Seq it is more likely that observed duplicates are from independent cDNAs from highly abundant transcripts. You still do not want to remove duplicates from RNA-Seq data even if you are doing SNP analysis with it.
In a nutshell: for DE analysis the transcripts are 'functional' and so we should retain the duplicates; for SNP analysis we want a consensus of the structure of our sequence.
Leave a comment:
-
Ok, I did mark duplicates and my results looks good, but maybe Ill run everything one more time not marking duplicates to check if even better.
Have you read this? Its what I based on:
Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a HapMap lymphoblastoid cell line sample and comparing results with sequence variation data from 1000 Genomes. We determined that the best method to achieve high specificity and sensitivity, and greatest number of SNP calls, is to remove duplicate sequence reads after alignment to the genome and to call SNPs using SAMtools. The accuracy of SNP calls is dependent on sequence coverage available. In terms of specificity, 89% of RNA-seq SNPs calls were true variants where coverage is >10X. In terms of sensitivity, at >10X coverage 92% of all expected SNPs in expressed exons could be detected. Overall, the results indicate that RNA-seq SNP data are a very useful by-product of sequence-based transcriptome analysis. If RNA-seq is applied to disease tissue samples and assuming that genes carrying mutations relevant to disease biology are being expressed, a very high proportion of these mutations can be detected.
Leave a comment:
-
sindrle no I didn't.. this thread satisfied me that marking dupes is not the correct thing to do for Rna-Seq
Leave a comment:
-
Hi sorry vishnuamaram
I posted a reply to your question (or thought I did) but my browser crashed in posting it..
Did you fix this? It just sounds like your path was incorrect yes?
Leave a comment:
-
Hi!
Did you finally mark duplicates (with Picard)?
Im doing the same thing as you, but I marked my duplicates.
I will maybe run it all again do compare the result.
Leave a comment:
-
Usually you would use RealignerTargetCreator to create an interval file for use by IndelRealigner.
Leave a comment:
-
Hey whatabambam,
this is the cd i used:-
java -Xmx30g -jar GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -T IndelRealigner \ -R REF_GENOME_hg19/hg19_karyotypic.fa -I WG_PBMC_SAM9_CORDSORT_RMDUp_KARYTP.BAM -targetIntervals
intervalListFromRTC.intervals \ -o WG_PBMC_SAM9_indelrealigned.bam
here is the error:-
##### ERROR MESSAGE: Couldn't read file /data/odity/Project_Blood-GNPC-464/Sample_WG-9/RAWFASTQ_CAT_FILES/intervalListFromRTC.intervals because The interval file does not exist
How exactly should we generate this interval file. Truly no idea. Let me know.
Thanking you in tons,
Vishnu.
Leave a comment:
-
What are the errors your getting? Possibly this is the sort of stuff I was talking about.. it's fussy about input files. I've only used UnifiedGenotyper but I think a lot of the requirements are for all GATK walkers.
To be fair though it gives quite informative error messages.. if your just using it wrong it's probably pretty easy to figure out what's wrong
Leave a comment:
-
Hey guys, (kennels, whatabambam)
Kindly any of you respond and let me know,
what are the proper commands to be used for GATK- indel realigner, BQSR and variant calling.
When ever i run a GATK commands, it throws an error.
Kindly help me.
Thank you,
Vishnu.
Leave a comment:
-
Originally posted by Kennels View PostHi
I am doing a similar analysis to you, and have run BWA-MEM aligned files, and GATK v2.2-3.
Are you sure that GATK Unified Genotyper tool doesn't run when duplicates are not marked? I could run it even when I do and don't use Picard MarkDuplicates on my input .bam file (i.e. duplicates marked or not).
I'm quoting this from memory of my previous project..
- It could of been Picard tools that was refusing to process without dupe marking and I just remember wrong (as that was part of my pipeline)
or
- They could of changed the program since then.. I know they are updating that thing all the time.. I asked about the ploidy setting over on GATK forum and that had been added since the publication of the GATK paper
if it's working then it's working.. great
I haven't tried it myself yet this time around. Haven't got as far as SNP calling yet.. far too many other issues to worry about on this project for now!
Leave a comment:
-
Hi
I am doing a similar analysis to you, and have run BWA-MEM aligned files, and GATK v2.2-3.
Are you sure that GATK Unified Genotyper tool doesn't run when duplicates are not marked? I could run it even when I do and don't use Picard MarkDuplicates on my input .bam file (i.e. duplicates marked or not).
Leave a comment:
-
Oh sorry - the last bit of your question
Yeah there would be evidence, because every program in the pipeline that works on a Bam adds an @PG flag... or in any case mostly they do (I believe they 'should' do / are supposed to)
So downstram programs know what upstream programs have touched the Bam even if they do nothing but add that flag
Leave a comment:
-
Hi kmcarr
Originally posted by kmcarr View PostDuplicate removal is valid when removing PCR duplicates; you do not want to remove duplicate reads which arose from independent fragments. Duplicate marking/removal programs can not distinguish between these two types of duplicates. They have the built in assumption that any duplicate found is the result of PCR duplication. This is a reasonable assumption if your reads are from a genomic DNA library. It is NOT a valid assumption for RNA-Seq data. For RNA-Seq it is more likely that observed duplicates are from independent cDNAs from highly abundant transcripts.
Originally posted by kmcarr View PostYou still do not want to remove duplicates from RNA-Seq data even if you are doing SNP analysis with it.
Originally posted by kmcarr View PostHow does the UnifiedGenotyper "know" whether or not the input BAM file has had MarkDuplicates run on it? (I'm asking, I honestly don't know.) Unless it rechecks the input for duplicates the only way it would know is by finding reads with the duplicate flag set. Let's imagine you run MarkDuplicates on a BAM file which had no duplicates at all in it (this is thought experiment, just go with it). There would be no evidence recorded in the file that MarkDuplicates was run on that particular BAM. Would UnifiedGenotyper refuse to accept this file?
Yeah it's just a bit annoying they have that limitation because messing around with the Bams like that isn't very publishable. For example for some previous work I changed some stuff in the Bam to make it accept alignments from BFAST (it wants you to use BWA). Fine for my own purposes but not very reportable. Still it's information I wouldn't have had otherwise. I only want this consensus SNP set for doing some parameter setting anyway. I search out values for some parameters by looking when the SNP sets are at maximum convergence. Any comments on whether that is a crazy fool method are welcome I'm experimenting here
Leave a comment:
-
Duplicate removal is valid when removing PCR duplicates; you do not want to remove duplicate reads which arose from independent fragments. Duplicate marking/removal programs can not distinguish between these two types of duplicates. They have the built in assumption that any duplicate found is the result of PCR duplication. This is a reasonable assumption if your reads are from a genomic DNA library. It is NOT a valid assumption for RNA-Seq data. For RNA-Seq it is more likely that observed duplicates are from independent cDNAs from highly abundant transcripts. You still do not want to remove duplicates from RNA-Seq data even if you are doing SNP analysis with it.
I do have a question about this comment:
I know from past experience that GATK's UnifiedGenotyper won't actually allow you to run it on Bam files which have not had PCR duplicates marked.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 01:08 PM
|
0 responses
5 views
0 likes
|
Last Post
by seqadmin
Today, 01:08 PM
|
||
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
86 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
||
Unveiling Genetic Associations Through Transcription Factor Binding Quantitative Trait Loci
by seqadmin
Started by seqadmin, 05-31-2023, 07:50 AM
|
0 responses
126 views
0 likes
|
Last Post
by seqadmin
05-31-2023, 07:50 AM
|
Leave a comment: