Hi Simon,
that's the way I was thinking too.
It's nice to hear this from you. I'm not too off
Thanks again for help
Federico
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Overlapping genes on the same strand cannot be distinguished in principle -- at least not by reads that map to the overlapping part. So, if the overlap is small, these will only be a few read, and you could ignore the issue. If, however, the overlap is big, you should consider the two genes as different isoforms of the same gene, and hence give them the same name.
So, I would somehow edit the GTF file such that whenever there are two genes A and B with large overlap, you replace all occurances of the gene IDs "A" and "B" with something like "A+B".
Leave a comment:
-
Hi Simon,
sorry for that. I'm still a beginner in this field and i'm still learning how all this works.
The thread on bioconductor list is more focused on DESeq2 analysis than htseq-count but I was indeed going to provide the link to this thread since the discussion was shifting towards htseq-count again.
Did you have a chance to read my previous post?
I think the issue I'm having (other than having the correct design for DESeq2) is that there are particular loci in the genome that have overlapping genes on the same strand. Though, these genes have a different ID so that's way I get a count of 0 .
As explaind in htseq manual If a read maps to an exon shared by several genes, this will appear to htseq-count as ambigous.
So at the moment I am having zero counts for all these particular loci, which sometimes contain also really highly expressed genes that I'd like to account for in the subsequent DGE analysis.
Do you have any idea on how I could account for this?
Thanks
Leave a comment:
-
Crossposting a question on two forums is never a good idea. At least the link to the other thread should be provided: https://stat.ethz.ch/pipermail/bioco...ry/057986.html
Leave a comment:
-
I am now indexing my reference with your suggestions above. Let me know if I'm not clear
Leave a comment:
-
Hi Simon,
I might have figured this out.
I think the issue is that I have particular loci in the genome that have overlapping genes on the same strand. Though, these genes have a different ID so that's way I get a count of 0 .
If a read maps to an exon shared by several genes, this will appear to htseq-count as ambigous.
So at the moment I am having zero counts for all these particular loci, which sometimes contain also really highly expressed genes.
Do you have any idea on how I could account for this?
Let me know if I'm not clear
Thanks
Federico
Leave a comment:
-
Write down the names of a few such reads, then use the '-o' option to get an output SAM file, grep in it for the read names and find out this way what htseq-count has done with these reads.
Leave a comment:
-
yes my gtf file contains all exon lines for antisense transcripts as well.
this explains the discrepancy since I used -s yes for the stranded libraries.
But why sometimes the counts are zero when the read maps to a position which is covered by a regular gene on one strand and a overlapping antisense gene on the opposite strand?
selecting -stranded yes should account for this,shouldn't it?
thanks
Leave a comment:
-
Does your GTF file contain "exon" lines for all the anti-sense transcripts? If so, this would explain the discrepancies.
htseq-count considers a read as "ambiguous" if it maps to more than one feature (gene). So, if a read maps to a position which is covered by a regular gene on one strand and by an overlapping antisense transcript on the other strand, then this read will be counted as ambiguous if you have set "stranded" to "no", because there is no information to decide whether the read originated from the sense of from the antisense transcript. For "stranded=yes", however, the read will be counted for the feature that is on the same strand as the read.
Leave a comment:
-
Hi Simon,
I have had a look at the alignments.
When the counts between stranded and unstraded librarires agree reasonably well, it looks alright. I can see that when they differ strongly it is because the locus is particulary complex, with a lot of antisense transcription and overlapping genes, which are pickep up only by the stranded libraries.
I've also noticed that in a lot of cases, genes that are well supported by alignment reads, have a count of 0. This happens more often for unstranded libraries, I guess particularly when there is antisense transcription. But it also happens for stranded libraries. Highly expressed genes with good alignment support have a count 0.
I wonder if running htseq-count with-m union
For each developmental stage, around 15-20 million counts are ambigous.
Would -m intersection_nonempty more appropriate in your opinion?
Thanks for help
Federico
Leave a comment:
-
Hi Simon,
Good idea!
I am sorting my bam files at the moment. I will write back once I had a look at the alignment
In the meantime thanks for the suggestion
Leave a comment:
-
Have you had a look at the alignments yet? Pick a two or three genes for which the counts between stranded and unstranded libraries differ strongly and two or three genes for which they agree reasonably well, then look at the eads at these loci with a genome browser (e.g., IGV). Do you see antisense transcripts which should not be there? Do you see many reads with low alignment quality? Any other peculiarity?
Leave a comment:
-
Hi Blanco,
to be sure I did a quick test using infer_experiment.py,it is indeed forward-reverse. I also did a blat to the genome and I got the same result.
This is PairEnd Data
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9189
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0811
Fraction of reads explained by other combinations: 0.0000
1++,1–,2+-,2-+
read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
Leave a comment:
-
Hi Federico,
have you confirmed that the data you have is indeed strand specific?
You can use the infer_experiment.py script from the RSeQC package to check it out - https://code.google.com/p/rseqc/
Depending on the library preparation protocol you either use the '-s reverse' or '-s yes' option. Do you know the specifics of the library prep? In any case, you will find out if you manage to run 'infer_experiment.py'.
/blanco
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
104 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
112 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
115 views
0 likes
|
Last Post
by EmiTom
10-07-2024, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: