Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • iGenomes reference genome not accurate?

    I grabbed the mm9 reference genome from iGenome. When running cuffcompare, I noticed that the genes.gtf refers to chr#_random whereas the chromosome folder only contained chr#, and no files for _random. I corrected this by downloading the chr#_random files from the UCSC mm9 build so now everything matches with the genes.gtf file.

    However, the mm9 build came with a pre-index bowtie2 file to use when aligning to the genome (I used TopHat). My concern is that the bowtie2 index was not created with the chr#_random files. Is there a way to check this?

    Does anyone know of a better place to grab an accurate mm9 build with bowtie2 index?

    If you think the bowtie2 index might be questionable, how can I index all the chromosomes at once to create a version I trust?

    Have you guys ran into similar issues?

    Thanks a lot for your help.

  • #2
    Did you get the mm9 from cufflinks igenomes site?

    Try the iGenomes mm9 directly from Illumina: http://support.illumina.com/sequenci...e/igenome.html
    Last edited by GenoMax; 10-26-2014, 05:30 PM.

    Comment


    • #3
      I am 90% sure that's where I grabbed it from but I'll download it again just to verify.

      Comment


      • #4
        Downloaded it again:

        Here's the chromosome list:
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrY.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr5.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr3.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr2.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr6.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr16.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr15.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr12.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrM.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr1.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr4.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr9.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr18.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr10.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr14.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrX.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr11.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr13.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr19.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr8.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr17.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr7.fa

        It's missing the _random.fa files for each chromosome, which are referenced in the genes.gtf file...

        Comment


        • #5
          If you are interested in the "chr*_random" sequences that are not uniquely placed on the chromosome then you should build a genome file/index on your own.

          Comment


          • #6
            *_random sequences could be unique sequences in heterochromatin or a large segdup where the flanking cannot be localized or placed. We use these sequences in mapping to reduce mapping artifacts, not really because we are interested in them. I am always curious why Illumina excludes them.

            Comment


            • #7
              I downloaded the _random chromosomes from the UCSC website and built a new Bowtie2 index using this. I ran Tophat with the -G genes.gtf option and the new index.

              However....
              Here's an error I got from CuffLinks:
              GFF warning: merging adjacent/overlapping segments (many of these)
              Kept 32976 ref transcripts out of 33802
              826 duplicate reference transcripts discarded.

              Here's a similar error I got from CuffCompare
              Kept 33035 transfrags out of 33262
              227 redundant cufflinks transfrags discarded.

              So now my GTF file isn't accurate? I got it from the Illumina iGenome mm9 build. I'm starting to think downloading anything from Illumina is more trouble than it's worth.

              If I was going to download the unmasked genome, build an index, and download an accurate GTF file for the genome, where would I best go? Or is there a way to verify my GTF file against my index?

              What has worked for you in this situation?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Today, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:17 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-29-2024, 10:49 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Working...
              X