
To the right of the dotted line are the reads mapping to a known Arabidopsis gene. To the left of the dotted line are reads mapping to another known Arabidopsis gene. You'll notice that some of the reads bridge the two genes causing cufflinks to merge them into a giant transcriptional unit (blue bar at the bottom).
I think what I will end up doing is discarding these merged loci from further analysis as the data may not be reliable.
Quoting Devon Ryan from Biostars (original post here: https://www.biostars.org/p/104551/)
The only real answer would be to look through the cufflinks source code, since this isn't documented anywhere. I would guess that these are merged into a single locus for processing because the annotation file you gave to cufflinks, likely combined with the modifications it made to the annotated transcripts given your alignments, produced possibly overlapping features (genes in this case) that might need to be processed as a single unit. If you used an unstranded library where WASH7P was expressed, then cufflinks might have just merged that, DDX11L1, and MIR1302-10 into a single transcript, in which case treating the whole region as a single locus would make more sense. I suspect that cufflinks pre-bins the genome according to possible cases like this and then processes them separately, often producing multiple final loci. That's a slightly educated guess, at least.
Welcome to the wonderful world of completely undocumented features :P
Welcome to the wonderful world of completely undocumented features :P
Leave a comment: