Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • where are my reads going: millions of fragments that uniquely aligned to the genes

    Hi experts,

    I am new to sequencing and the related use of bioinformatics so to skip the process of generating bam files I do the following:

    I first generate Tophat alignment in illumina basespace.
    The alignment and the summary of the alignment looks reasonable, for example for a rapid run of 90 samples for one of the sample I get the following data in basespace:

    Number of Reads: Read1-4,709,154 Read2-4,709,154
    Total Aligned Reads (% Reads): Read1-81.68% Read2-85.20%
    Abundant Reads (% Reads) Read1-18.52% Read2-19.53%
    Unaligned Reads (% Reads) Read1-18.32% Read2-14.80%

    I then take the bam files [*.alignment.bam] and use it to align to the human genes .gtf file using the bioconductor RNAseq workflow pipeline. I have downloaded the human genes .gtf file from iGenome.

    when I check for the millions of fragments that uniquely aligned to the genes using the " round( colSums(assay(se)) / 1e6, 1 ) " command in R

    I get only 0.5 million reads aligned to genes

    given the tophat data from basespace (shown above) even if I consider 81% of 4,709,154 and then subtract the percent for unaligned reads and abundant read I should still get around 2 million reads for the sample.

    where are my reads going when I am aligning it to the .gtf file.

    Thanks in advance for your valuable time.

    Ram

  • #2
    Always the first thing to check is if your fasta and gtf use the same chromosome notation.

    Comment


    • #3
      Thanks,
      But I guess if the chromosome notations are different the command will not be executed. If I am getting some results, shouldn't it mean that the notations are same?
      Besides I am using gtf file from illumina igenome and the bam files are also generated by illumina.

      In your answer do you mean, bam file and gtf file, right?

      Thanks

      Comment


      • #4
        The chromosome names in the bam file and bam file header will be the same as those in the reference fasta file used to align the reads to.

        Have you checked if the basespace alignment and the gtf file use the same version of the human genome?
        Last edited by mastal; 04-04-2017, 06:34 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-07-2024, 06:58 AM
        0 responses
        179 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:18 AM
        0 responses
        228 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:04 AM
        0 responses
        184 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Working...
        X