Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • laxman
    Junior Member
    • Sep 2010
    • 5

    Understanding tophat intermediate logs

    Hi
    I got on about 6M out of 35M read mapped tophat for my sample. On doing QC I found out that there was some problem in the first 8 bases. I trimmed them and rerunning tophat (with default paramters except a mismatch of 2). I am looking at the intermediate logs and see that I still have log in which 85% of my reads fail to align.
    I have two questions:
    1. Can someone tell us a bit more about the intermediate files and logs generated?

    2. How much does a problem in %GC content affect the alignment.

    I am attaching some of the log files below:
    -thanks
    -LAx
    ______________________________________________
    [liyer01@h01 logs]$ more file2vqTeB.log
    # reads processed: 38843979
    # reads with at least one reported alignment: 5520299 (14.21%)
    # reads that failed to align: 33230242 (85.55%)
    # reads with alignments suppressed due to -m: 93438 (0.24%)
    Reported 8463229 alignments to 1 output stream(s)
    [liyer01@h01 logs]$ more long_spanning_reads.log
    long_spanning_reads v1.1.0 (1606)
    --------------------------------------------
    Opening S6_tophat_out/left_kept_reads.fq for reading
    Opening /dev/null for reading
    Opening S6_tophat_out/tmp/left_kept_reads.bwtout for reading
    Loading spliced hits...done
    Loading junctions...done
    [liyer01@h01 logs]$ more file8W6m7g.log
    # reads processed: 33230242
    # reads with at least one reported alignment: 26854884 (80.81%)
    # reads that failed to align: 3283599 (9.88%)
    # reads with alignments suppressed due to -m: 3091759 (9.30%)
    Reported 77514776 alignments to 1 output stream(s)
  • ian.d.reid
    Junior Member
    • May 2009
    • 3

    #2
    LAx,
    1. Tophat uses several smaller programs to do its work. One of these programs is long_spanning_reads; another is bowtie. If you look in logs/run.log you will find the command lines that tophat issues to its subsidiary programs, and see where the intermediate data files in /tmp and the logs with the cryptic names come from.
    The first and third log files that you attached are from bowtie. The first is probably from mapping the whole reads, and the third, judging from the number of reads processed, is probably from mapping segments of the initially unmapped reads in order to find splice junctions. The good news is that >80% of the segments mapped; the bad news is that 33 million aligned segments generated 77.5 million alignments, so many of the read segments aligned in more than one place.
    The second log file is from long_spanning_reads (obviously) and just shows that the program ran without any problems.

    2. The question is unclear. What kind of problem in GC content?

    Comment

    • laxman
      Junior Member
      • Sep 2010
      • 5

      #3
      Hi Ian
      Thanks. I am trying to figure out how to deal with the sequences that align at multiple locations. Further, I am also trying to figure out what caused it. Is it something to do with the sequencing? i. e. specific artifacts in the reads produced. I did find that there was some issues in the "per base sequence content" plot which plots the %G, %T, %C, %A across all bases. There was flucutations in the first 8 bases and divergence again after about 40 bases. I am beginning to think about it. One think, I thought about was to filter out bases in the the reads using fastax tools to contain only reads with a quality greater than 30 and a minimum length of 25. The hope is that the multiple hits in the alignment are caused by reads portions with bad quality and would be remedied by this.
      Any suggestions?

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM
      • SEQadmin2
        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
        by SEQadmin2


        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


        Introduction

        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
        05-22-2026, 06:42 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      40 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      47 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      49 views
      0 reactions
      Last Post SEQadmin2  
      Working...