Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jmarshall
    replied
    Originally posted by reubennowell View Post
    My input fasta file was line wrapped, and there was a blank line between sequences. The problem sequence was where coincidentally the seq ended exactly at the line wrap position
    Thanks for reporting this. I've now fixed it and the fix will appear in samtools (and htslib) 1.3.

    Leave a comment:


  • reubennowell
    replied
    I found another case where this was happening I thought I'd just add to the list for future reference.

    My input fasta file was line wrapped, and there was a blank line between sequences. The problem sequence was where coincidentally the seq ended exactly at the line wrap position, ie:

    Code:
    >not_a_problem_seq
    AATCGACGTACGTAGCTGATC
    ATCGATGCTAGCTATAATCGT
    ACTAGCTACTGG
    
    >problem_seq
    ACTGCTAGCTGATCGATCGTAG
    ACGTACGTAGCTAGCTGACTGA
    ACTGATCGTAGCTAGCTGATCG <- here!
    
    >next_seq
    ATCGACGTAGCTAGCTAGCTAT
    ACTGATGCTACTAGCTGATGCT
    ACTGA
    Issue was resolved by actually adding an extra blank line between the end of problem_seq and the header for next_seq.

    Leave a comment:


  • saad0105050
    replied
    In my case, the actual problem was different (I found out later).

    I extracted the genomic sequences from UCSC table browser in fasta format and saved it as my reference file. Invoking samtools faidx was giving the error 'different line length...'. However, fixing empty lines did not stop the error, and I discovered that my fasta file was truncated due to timeout while downloading from UCSC; the error message was the last lines of my fasta file and it was causing samtools emit this message.

    Leave a comment:


  • jdrum00
    replied
    That works too. But there are canned answers to this sort of question nowadays, fortunately, that didn't exist in 2009 when I originally posted. Picard, for example, has a FASTA-fixer:



    I suspect BioPerl and similar packages were doing this even in 2009, but I was pretty new to the field then and didn't know about them.

    Leave a comment:


  • saad0105050
    replied
    Thanks.

    I did this:
    Code:
    awk '(NF!=0){print}' sample.fa>tmp.fa
    awk '(NF==0){print "\blank"}' tmp.fa # check
    mv tmp.fa sample.fa

    Leave a comment:


  • jdrum00
    replied
    Turns out that in my case it's actually a blank line that's causing the problem. I started wondering why the issue was coming up at all, given that awk says the longest line in the file is 70 characters long:

    awk '{ if (x < length()) x = length() } END { print x }' hsap_36.1_hg18.fa

    I thought surely it couldn't be complaining about short lines...so I looked for sub-70-character lines:

    awk 'length($0) < 70' hsap_36.1_hg18.fa

    I got the expected set of header lines, and one trailing incomplete line per header block. But the last line of the chr13 block was blank due to the actual final line of sequence being exactly 70 characters.

    So the brute-force solution for me is just to eliminate blank lines from the file:

    grep . hsap_36.1_hg18.fa > tmp; mv tmp hsap_36.1_hg18.fa

    This still leaves the questions of (1) why samtools can't handle a simple silly blank line and (2) how to fold a FASTA file that does have anomalous line lengths...but my immediate crisis is over. Thanks in advance, though, for any insights anyone has on either of those questions!

    Leave a comment:


  • jdrum00
    started a topic Samtools fasta line length segfault, redux

    Samtools fasta line length segfault, redux

    There's been at least one thread on this already, so I'm wondering if anyone has a handy shell script they could share for fixing .fa files with bad line lengths. In my particular case, I get this error from samtools faidx:

    [fai_build_core] different line length in sequence 'chr13'.

    ...when using our standard reference file, hsap_36.1_hg18.fa.

    I want to reformat this file rather than downloading some known-correct one, since I don't know exactly where our file came from and what differences there might be between it and another.

    The thread linked above has a snippet of BioPerl, but alas, I don't have that available on the server where I'm working. The last poster outlines an approach, but doesn't provide actual code.

    Hopefully I'm not being too lazy here by asking if someone's got such a script in their back pocket...I'm just a bit overwhelmed at the moment. I would love not to have to solve this particular problem from scratch, and I'm sure a solution posted here would be useful to others in the future. Thanks!

Latest Articles

Collapse

  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
179 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-28-2025, 12:58 PM
0 responses
272 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-24-2025, 02:48 PM
0 responses
656 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2025, 02:46 PM
0 responses
267 views
0 likes
Last Post seqadmin  
Working...
X