Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • We encountered a non-standard non-IUPAC base in the provided reference '88'

    Hi,
    Currently, I am working with bowtie2 and GATK to call SNPs on sugarcane. I am using Sbicolor_v2.1_255.fa, sorghum genome downloaded from phytozome, as reference.
    java -Xms128m -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R Sbicolor_v2.1_255.fa -I input.sorted.bam -ploidy 8 -o input_gatk.vcf
    I run into the following error.
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.1-1-g07a4bf8):
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '88'
    ##### ERROR ------------------------------------------------------------------------------------------

    I hope someone can help me with these issues.
    Thanks in advance.

  • #2
    See : http://gatkforums.broadinstitute.org...d-reference-13

    The file from ftp://ftp.jgi-psf.org/pub/compgen/ph...v2.1_255.fa.gz

    appears to have "X"s in them ...

    grep -n "X" Sbicolor_v2.1_255.fa
    1653513:AGCCTCCCGTCCCCGATGTCTCCGACCACTGGTTAATATTCTTTXXXXXXXXXXXXXXXXXXXXXXXXXXCGGCGCTGGG
    5189698:TCAATAATTTATCACTAAATCCTTTXXXXXXXXXXXXXXXXXXXXXAAATGTTATCATGTCTCAAAACGTCTGATGAATG
    8205175:CTGTGTAGATAAGAXXXXXXXXXXXXXXXXXXXXXXXXXXCAAACAGGAAAGAGCGGCAGTGATTCAAGCAAAAACAAGT

    IUPAC codes are here: https://en.wikipedia.org/wiki/Nucleic_acid_notation

    I don't see X
    Last edited by Richard Finney; 08-27-2014, 02:52 PM.

    Comment


    • #3
      Apparently, the X means the sequence is masked for some reason, such as it was identified to be artificial (like vector sequence) and should not be in the assembly. I suggest you replace them with N.

      Comment


      • #4
        Thank you the reply. It is quite helpful.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X