Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ocs
    Member
    • May 2011
    • 27

    Exclude chrM, chrUn* from reference // htseq-count warning on chrM

    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    Originally posted by ocs View Post
    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver
    Oliver,

    Tophat (or any alignment program) aligns reads to a reference sequence, typically provided as a FASTA file and indexed by the alignment program. The GFF file is a feature annotation file, describing features (e.g. genes) relative to the reference sequence. TopHat aligned your reads to the reference genome provided by the FASTA file which included the sequences for chrM and chrUn_. However the GFF file associated with this genome contains no feature annotations for these sequences. This is a fairly typical occurrence with TopHat/htseq-count. It is safe to ignore these warnings.

    Comment

    • ocs
      Member
      • May 2011
      • 27

      #3
      Ok, thanks, that makes sense. I got there something wrong ...

      Comment

      • swbarnes2
        Senior Member
        • May 2008
        • 910

        #4
        If your DNA sample really contains DNA from the mitochrondria, you ought to put that in the fasta, so those reads can align to where they really belong. If you leave it out, the aligner might wrongly place those reads somewhere else, which will mess up the accuracy of your alignment.

        Probably, the software is just trying to warn you about something it thinks is strange. It's trying to warn you that you might have a messed up GFF, because it has no gene annotation for one of your chromosomes. But if that's how its supposed to be, then you should just carry on, because you know better than the computer.

        If you dont like that warning, you could always filter the bam so that the chrM reads are filtered away. If you have no GFF annotation for that chromosome, you don't need them for anything.

        Comment

        • NextGenSeq
          Senior Member
          • Apr 2009
          • 482

          #5
          Actually the mitochondrial genome is very important if you are doing human mutation screening. There are a lot of hereditary mitochondrial diseases. Mutations in mitochondria DNA have also been reported in cancer.

          Comment

          • ocs
            Member
            • May 2011
            • 27

            #6
            For some reason I thought that when I provide an annotation file to tophat that it only aligns to the annotated genes (and left out the fact that it only cares about the splice junctions). Actually that makes no sense, my mind messed there something up ;-)
            Thanks to all for your answers!

            Comment

            • arrchi
              Member
              • Mar 2011
              • 46

              #7
              Hi ocs,

              Did you use the latest version of cufflinks? And you still see genes on chrM? I used the latest version of cufflinks, I want to have the genes on chrM, but the genes are all gone. I don't know why?

              Comment

              • ocs
                Member
                • May 2011
                • 27

                #8
                I had been using cufflinks version 1.1.0 and TopHat version 1.3.3 and I have reads aligned to chrM but there are no annotated genes. Which reference file did you use?
                Last edited by ocs; 11-02-2011, 10:22 AM. Reason: Had to correct myself, I see no genes on chrM.

                Comment

                • arrchi
                  Member
                  • Mar 2011
                  • 46

                  #9
                  Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

                  Comment

                  • arrchi
                    Member
                    • Mar 2011
                    • 46

                    #10
                    Are you using human genome reference too? How many genes on chrM you got?

                    Comment

                    • ocs
                      Member
                      • May 2011
                      • 27

                      #11
                      Originally posted by arrchi View Post
                      Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.
                      Hello arrchi,

                      first, you have to be more specific. There are several reference genome files for the human genome (Ensembl, NCBI, UCSC: http://tophat.cbcb.umd.edu/igenomes.html). I used hg19 (UCSC) which I mentioned in my very first post. I also used the annotation file from the iGenome (genes.gtf). Which one did you use and which annotation?

                      Second, I don't know what you actually want - you want to have genes on chrM and then you have genes on chrM?

                      If you have a look at the annotation file (in my case genes.gtf from the iGenome hg19 package) you will find no annotated genes on chrM:
                      Code:
                      $ grep chrM genes.gtf | wc -l
                      0

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 10:09 AM
                      0 responses
                      10 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      20 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      27 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...