Hello,
I am using tophat-cufflinks for denovo assembly of transcripts for input into an annotation pipeline. I have about 26 million PE 76-mer reads and about 14 million SE 76mer reads.
Thus far I have just run the default parameters but adjusted introns to be closer to the microorganism I'm working with (much smaller, under 1000bp). I seem to be having 2 problems. Attaching some pdf snapshots.
1) cufflinks extending transcripts/UTRs too far and/or merging adjacent exons beyond what seems reasonable.
2) a very large number of exons are joined by what looks in my genome annotation viewer (Apollo) to be connecting introns but they are HUGE and well above the limits I supplied to both tophat and cufflinks. This problem only seems to happen when I include the PE run (see run parameters below). Maybe this is just a function of the viewer.
I know the first of these is a known problem for tophats/cufflinks, just trying to figure out what parameters would be most important to adjust to fix it...I did get the following error from cufflinks but looks like it was able to read as a sam file and complete run.
tophat:
tophat -i 15 -I 500 -r 425 --mate-std-dev 150 --solexa1.3-quals assembly_index.fa pe1.fastq pe2.fastq
cufflinks:
cufflinks -I 500 /pseudospace2/bushleyk/RNA_Alignment/Tophat/TiRNAALL/76mer/NA3/fastx100/run2_I15_500/pairs/tophat_out/accepted_hits.sam
error:
/local/cluster/bin/cufflinks: /usr/lib64/libz.so.1: no version information available (required by /local/cluster/bin/cufflinks)
[bam_header_read] EOF marker is absent.
File /pseudospace2/bushleyk/RNA_Alignment/Tophat/TiRNAALL/76mer/NA3/fastx100/run2_I15_500/pairs/tophat_out/accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[04:27:19] Inspecting reads and determining fragment length distribution.
I'm assuming maybe for tophat those parameters are -F, -a, -g and coverage-search (this is automatically turned off for reads>75bp, apparently effects sensitivity...does anyone know if it's a good idea to turn on?) and for cufflinks -F/--min-isoform-fraction <0.0-1.0>, -j (pre-mRNA fraction), -A --small-anchor-fraction <0.0-1.0>, and -min-frags-per-transfrag.
Let me know if anyone has had similar problems or any suggestions for which of these might be most important/effecting the results I see.
Thanks,
Kate
I am using tophat-cufflinks for denovo assembly of transcripts for input into an annotation pipeline. I have about 26 million PE 76-mer reads and about 14 million SE 76mer reads.
Thus far I have just run the default parameters but adjusted introns to be closer to the microorganism I'm working with (much smaller, under 1000bp). I seem to be having 2 problems. Attaching some pdf snapshots.
1) cufflinks extending transcripts/UTRs too far and/or merging adjacent exons beyond what seems reasonable.
2) a very large number of exons are joined by what looks in my genome annotation viewer (Apollo) to be connecting introns but they are HUGE and well above the limits I supplied to both tophat and cufflinks. This problem only seems to happen when I include the PE run (see run parameters below). Maybe this is just a function of the viewer.
I know the first of these is a known problem for tophats/cufflinks, just trying to figure out what parameters would be most important to adjust to fix it...I did get the following error from cufflinks but looks like it was able to read as a sam file and complete run.
tophat:
tophat -i 15 -I 500 -r 425 --mate-std-dev 150 --solexa1.3-quals assembly_index.fa pe1.fastq pe2.fastq
cufflinks:
cufflinks -I 500 /pseudospace2/bushleyk/RNA_Alignment/Tophat/TiRNAALL/76mer/NA3/fastx100/run2_I15_500/pairs/tophat_out/accepted_hits.sam
error:
/local/cluster/bin/cufflinks: /usr/lib64/libz.so.1: no version information available (required by /local/cluster/bin/cufflinks)
[bam_header_read] EOF marker is absent.
File /pseudospace2/bushleyk/RNA_Alignment/Tophat/TiRNAALL/76mer/NA3/fastx100/run2_I15_500/pairs/tophat_out/accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[04:27:19] Inspecting reads and determining fragment length distribution.
I'm assuming maybe for tophat those parameters are -F, -a, -g and coverage-search (this is automatically turned off for reads>75bp, apparently effects sensitivity...does anyone know if it's a good idea to turn on?) and for cufflinks -F/--min-isoform-fraction <0.0-1.0>, -j (pre-mRNA fraction), -A --small-anchor-fraction <0.0-1.0>, and -min-frags-per-transfrag.
Let me know if anyone has had similar problems or any suggestions for which of these might be most important/effecting the results I see.
Thanks,
Kate
Comment