Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Where to obtain HumanNCBI37_UCSC reference sequence?

    Hi.

    Please can somebody let me know where I can obtain the HumanNCBI37_UCSC reference sequence? It is hg19 but standard hg19 reference files have incompatible dictionaries compared with BAM files aligned with this reference. Please tell me where I can download this reference file from. I've looked everywhere (including both NCBI and UCSC) and can't find it. Thanks for your help.

    Regards

    - Dave Curtis

  • #2
    Originally posted by davecurtis View Post
    Hi.

    Please can somebody let me know where I can obtain the HumanNCBI37_UCSC reference sequence? It is hg19 but standard hg19 reference files have incompatible dictionaries compared with BAM files aligned with this reference. Please tell me where I can download this reference file from. I've looked everywhere (including both NCBI and UCSC) and can't find it. Thanks for your help.

    Regards

    - Dave Curtis
    What does that (bold above) mean? All Hg19 files are in this directory at UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/

    Comment


    • #3
      Thanks. I don't see the file I am looking for in that folder.

      I have a set of BAM files which have been aligned using this file:
      samtoolsRefFile=/illumina/scratch/services/Genomes/FASTA_UCSC/HumanNCBI37_UCSC/HumanNCBI37_UCSC_XX.fa

      I have a reference file called hg19_UCSC.fa and for most chromosomes HaplotypeCaller runs fine using this reference sequence. However with HaplotypeCaller for chromosomes 19, 21 and 22 I get this error message:
      WARN 08:38:22,963 SequenceDictionaryUtils - Input files reads and reference have incompatible contigs: The following contigs included in the intervals to process have different indices in the sequence dictionaries for the reads vs. the reference: [chr22]. As a result, the GATK engine will not correctly process reads from these contigs. You should either fix the sequence dictionaries for your reads so that these contigs have the same indices as in the sequence dictionary for your reference, or exclude these contigs from your intervals. This error can be disabled via -U ALLOW_SEQ_DICT_INCOMPATIBILITY, however this is not recommended as the GATK engine will not behave correctly..

      In fact, even if I set ALLOW_SEQ_DICT_INCOMPATIBILITY I still get the error and I don't get any calls for these chromosomes.

      It seems that there is some incompatibility in the dictionaries of the BAM and reference files which I have not been able to fix.

      Using google, I have seen other people refer to the HumanNCBI37_UCSC reference sequence so I assume it is a standard reference for hg19 but presumably with a slightly different dictionary from the file called hg19_UCSC.fa.

      Comment


      • #4
        Perhaps someone else will have a better answer ...

        You may have to ask whoever aligned those files in the first place as to where they got their reference from. With patches/releases it may be difficult to nail down an exact provenance for a file that claims to be HumanNCBI37_UCSC reference unless you know that it was obtained from the directory I posted above at UCSC.

        Comment


        • #5
          Thanks. I think I've worked it out. The BAM files I have were prepared with two different references - one with the Y chromosome and one without and this threw out the indexing for the chromosomes listed after.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:35 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Working...
          X