Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools pileup error @ [fai_build_core]

    Hello,

    I am trying to use samtools pile up for SNP detection.

    I am getting this error with samtools pileup:

    [fai_build_core] line length exceeds 65535 in sequence tucosnp

    'tucosnp' is what I used as my ref sequence to build a bowtie index to which I aligned ~100million Illumina reads. 'tucosnp' is not a whole genome, but instead the product of concatenation of all my contigs generated from a de novo ABySS assembly. The reference is one big fasta file of about 150million bases.

    I'm not sure what I should do here... Splitting up the ref sequence into files of 65,000 bases seems absurd. Maybe I am missing something silly here..

    in $samtools pileup -f ref.fasta aln.sorted.bam

    I assume ref.fasta refers to the reference sequence (reference genome), but maybe I am wrong..

    Any help appreciated!

  • #2
    It doesn't look like samtools is complaining that the sequence is too long; it says that the line is too long. When you concatenated your contigs into your fasta file did your wrap the sequence into multiple lines or write it as a single, ginormous line? It could also be an line break issue of you created the fasta file on one type of system (Mac, Linux, Win) and are running samtools on a different type.

    Comment


    • #3
      Hi peromhc,
      I am getting the same error in samtools faidx command ([fai_build_core] line length exceeds 65535 in sequence). Looked at all the scaffolds in my fasta file and nothing seems out of place. Could you please tell how you resolved your issue.
      Thanks,

      Comment


      • #4
        Originally posted by Mansequencer View Post
        Hi peromhc,
        I am getting the same error in samtools faidx command ([fai_build_core] line length exceeds 65535 in sequence). Looked at all the scaffolds in my fasta file and nothing seems out of place. Could you please tell how you resolved your issue.
        Thanks,
        What is the length of your longest line?

        Comment


        • #5
          if you're on linux, a quick way to fix is to do:
          fold some.fasta > some.folded.fasta
          which will wrap any lines longer than 80 (or you can specify a length on the command-line).

          Comment


          • #6
            Hi Brentp,
            Thanks for your advice. It worked.

            Comment


            • #7
              I faced the same problem during "samtools faidx".
              [fai_build_core] line length exceeds 65535 in sequence 'Chromosome1'.
              I realised all bases (~4MB) are in one line. BWA index this without any problem. Then i used Linux "fold" command. AFter that samtools gives me following error--
              [fai_build_core] different line length in sequence 'Chromosome1'.
              In fact i had a reference containing two chromosomes, and i wanted to concatenate these two before indexing. Ultimately i solved this issue as follow--
              1. Removed the fasta header of the second chromosome (file1)
              2. seqret file1 file2
              This file2 is properly handled by samtools faidx.
              PS: Each line in fasta file should be equal in length in order to faidx work AND a single string must not greater than 65535 character.
              Curious to know other's comment.
              Thanks

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin







                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has...
                12-02-2024, 01:49 PM
              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-02-2024, 09:29 AM
              0 responses
              151 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 09:06 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 08:03 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-22-2024, 07:36 AM
              0 responses
              75 views
              0 likes
              Last Post seqadmin  
              Working...
              X