Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude chrM, chrUn* from reference // htseq-count warning on chrM

    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver

  • #2
    Originally posted by ocs View Post
    Hello,

    currently I'm doing some RNA-seq analysis using hg19 as the reference genome. I wondered if it is better to exclude chrM and chrUn* stuff from the reference or if this will cause bias.

    I'm asking because I'm using htseq-count to get the coverage and the program gives me lots of this warning:
    Code:
    because chromosome 'chrM', to which it has been aligned, did not appear in the GFF file
    (only related to chrM and rarely on some chrUn_*).

    But I used the same gff file for tophat and for htseq-count. So when its aligned to chrM it must be contained in the gff file. I don't know why htseq-count complains about this.

    Any hints on that?

    Thanks in advance,
    Oliver
    Oliver,

    Tophat (or any alignment program) aligns reads to a reference sequence, typically provided as a FASTA file and indexed by the alignment program. The GFF file is a feature annotation file, describing features (e.g. genes) relative to the reference sequence. TopHat aligned your reads to the reference genome provided by the FASTA file which included the sequences for chrM and chrUn_. However the GFF file associated with this genome contains no feature annotations for these sequences. This is a fairly typical occurrence with TopHat/htseq-count. It is safe to ignore these warnings.

    Comment


    • #3
      Ok, thanks, that makes sense. I got there something wrong ...

      Comment


      • #4
        If your DNA sample really contains DNA from the mitochrondria, you ought to put that in the fasta, so those reads can align to where they really belong. If you leave it out, the aligner might wrongly place those reads somewhere else, which will mess up the accuracy of your alignment.

        Probably, the software is just trying to warn you about something it thinks is strange. It's trying to warn you that you might have a messed up GFF, because it has no gene annotation for one of your chromosomes. But if that's how its supposed to be, then you should just carry on, because you know better than the computer.

        If you dont like that warning, you could always filter the bam so that the chrM reads are filtered away. If you have no GFF annotation for that chromosome, you don't need them for anything.

        Comment


        • #5
          Actually the mitochondrial genome is very important if you are doing human mutation screening. There are a lot of hereditary mitochondrial diseases. Mutations in mitochondria DNA have also been reported in cancer.

          Comment


          • #6
            For some reason I thought that when I provide an annotation file to tophat that it only aligns to the annotated genes (and left out the fact that it only cares about the splice junctions). Actually that makes no sense, my mind messed there something up ;-)
            Thanks to all for your answers!

            Comment


            • #7
              Hi ocs,

              Did you use the latest version of cufflinks? And you still see genes on chrM? I used the latest version of cufflinks, I want to have the genes on chrM, but the genes are all gone. I don't know why?

              Comment


              • #8
                I had been using cufflinks version 1.1.0 and TopHat version 1.3.3 and I have reads aligned to chrM but there are no annotated genes. Which reference file did you use?
                Last edited by ocs; 11-02-2011, 10:22 AM. Reason: Had to correct myself, I see no genes on chrM.

                Comment


                • #9
                  Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.

                  Comment


                  • #10
                    Are you using human genome reference too? How many genes on chrM you got?

                    Comment


                    • #11
                      Originally posted by arrchi View Post
                      Thanks for your reply. I used human genome reference. I repeated the same experiment but using an older version of cufflinks, I have genes on chrM.
                      Hello arrchi,

                      first, you have to be more specific. There are several reference genome files for the human genome (Ensembl, NCBI, UCSC: http://tophat.cbcb.umd.edu/igenomes.html). I used hg19 (UCSC) which I mentioned in my very first post. I also used the annotation file from the iGenome (genes.gtf). Which one did you use and which annotation?

                      Second, I don't know what you actually want - you want to have genes on chrM and then you have genes on chrM?

                      If you have a look at the annotation file (in my case genes.gtf from the iGenome hg19 package) you will find no annotated genes on chrM:
                      Code:
                      $ grep chrM genes.gtf | wc -l
                      0

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Non-Coding RNA Research and Technologies
                        by seqadmin




                        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                        Nobel Prize for MicroRNA Discovery
                        This week,...
                        10-07-2024, 08:07 AM
                      • seqadmin
                        Recent Developments in Metagenomics
                        by seqadmin





                        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                        09-23-2024, 06:35 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 06:35 AM
                      0 responses
                      7 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 02:44 PM
                      0 responses
                      7 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-11-2024, 06:55 AM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-02-2024, 04:51 AM
                      0 responses
                      111 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X