Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CompBio
    replied
    You might try a tool such as MapSplice instead. It uses statistics based on the alignments themselves (such as distribution of reads across a junction) rather than any sequence-related measures to validate spliced alignments. It's not hard to use, but can be finicky and can consume its fair share of disk space.

    Leave a comment:


  • drdna
    replied
    Originally posted by gringer View Post
    Using tophat for a viral genome seems a little odd -- I wasn't aware that viruses had introns.
    Viruses undergo RNA recombination which produces molecules that are identical in structure to spliced transcripts. The only difference is that the border sequences don't match consensus splice junctions. Therefore, I was looking for reads that mapped to different regions of the viral genome in the same manner as would an intron-spanning read.
    Last edited by drdna; 10-07-2013, 12:42 AM.

    Leave a comment:


  • gringer
    replied
    Using tophat for a viral genome seems a little odd -- I wasn't aware that viruses had introns.

    Leave a comment:


  • dpryan
    replied
    Ah, well then those are really junk then. Predicting splicing based on a single base without a reference annotation seems like a bad idea! Thanks for the heads-up and you might consider making a bug report for tophat.

    Leave a comment:


  • drdna
    replied
    Tophat2 is stupid

    Originally posted by dpryan View Post
    To be fair, did you give tophat a reference GTF/GFF to use? If so, you're asking it to align against the transcriptome first and then convert those coordinates back to the genome. You're rather likely to get results like this by doing that (yes, it's probably better to simply soft-clip that one base, but you didn't use local-alignment and, anyway, it then matched the transcriptome).
    Nope, no reference provided. These were reads mapped to a viral genome. When I imported the .bam file into IGV, it showed a fraction of reads that were predicted to be spliced. When I followed up on these, they turned out to be bogus because all of the reads contained a single nucleotide on one side of the predicted splice site. It turns out that Tophat2 looks for putative intron splice sites and automatically assumes that the introns are valid as long as it can align at least ONE nucleotide on the other side of the splice junction. How stupid is that?

    Leave a comment:


  • dpryan
    replied
    To be fair, did you give tophat a reference GTF/GFF to use? If so, you're asking it to align against the transcriptome first and then convert those coordinates back to the genome. You're rather likely to get results like this by doing that (yes, it's probably better to simply soft-clip that one base, but you didn't use local-alignment and, anyway, it then matched the transcriptome).

    Leave a comment:


  • drdna
    started a topic Tophat2 produces thousands of invalid alignments

    Tophat2 produces thousands of invalid alignments

    Users beware. Tophat2 produces thousands of CIGAR alignments that are just plain wrong. As an example, see the following .bam file line:

    M01478:14:000000000-A5C8R:1:1106:17706:4022 256 CYDV_S1_L001_R1_trimmed_(paired)_contigs_50/104 4239 3 1M184N110M * CTGCTCTGCCCTATGCGATCTGTCCGATCGATCCTTCCAGACCATTGTGGAGGACGAAGATGTTGTTGATACCCCGAACGGACCGTGGCTCCCTGTGCAGGATGATGGTGT DEEEEFFFFFCFGGGGGGGGGGHHHGGGGGGHHGHHHHHHHGHHHHHHHGHHGGHGGGGGHHHHHHHHHHHHHHHGGGGGGGGGGGHGGHHGGHHHHHHHHGHHHHHHGHG AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:111 YT:Z:UU XS:A:- NH:i:2 CC:Z:= CP:i:4423 HI:i:0

    The CIGAR indicates a single match/mismatch, then 184 nucleotides that are missing from the read relative to the reference and then 110 match/mismatches. Obviously, this has to be wrong. There is no way you can predict that a single nucleotide matches a position 184 nucleotides upstream of the rest of an alignment, when there are about 40 intervening base positions that are equally valid matches. Furthermore, in this particular example, there is no mismatch. The first nucleotide precisely matches the nucleotide upstream of the 110 aligned nucleotides. In other words, Tophat is just "making stuff up"! Note: no paired end reads were used in this Tophat run.

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Working...
X