Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rammohanshukla
    Junior Member
    • Mar 2013
    • 4

    where are my reads going: millions of fragments that uniquely aligned to the genes

    Hi experts,

    I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:

    I first generate Tophat alignment in illumina basespace.
    The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:

    Number of Reads: Read1-4,709,154 Read2-4,709,154
    Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
    Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
    Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%

    I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.

    when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R

    I get only 0.5 million reads aligned to genes

    given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.

    where are my reads going when I am aligning it to the .gtf file.

    Thanks in advance for your valuable time.

    Ram
  • wdecoster
    Member
    • Oct 2015
    • 97

    #2
    Always the first thing to check is if your fasta and gtf use the same chromosome notation.

    Comment

    • rammohanshukla
      Junior Member
      • Mar 2013
      • 4

      #3
      Thanks,
      But I guess if the chromosome notations are different the command will not be executed. If I am getting some results, shouldn't it mean that the notations are same?
      Besides I am using gtf file from illumina igenome and the bam files are also generated by illumina.

      In your answer do you mean, bam file and gtf file, right?

      Thanks

      Comment

      • mastal
        Senior Member
        • Mar 2009
        • 666

        #4
        The chromosome names in the bam file and bam file header will be the same as those in the reference fasta file used to align the reads to.

        Have you checked if the basespace alignment and the gtf file use the same version of the human genome?
        Last edited by mastal; 04-04-2017, 06:34 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM
        • seqadmin
          Investigating the Gut Microbiome Through Diet and Spatial Biology
          by seqadmin




          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
          02-24-2025, 06:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        17 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        18 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        19 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        186 views
        0 reactions
        Last Post seqadmin  
        Working...