Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nupurgupta
    replied
    Same problem

    Did you find a solution to the null problem please?

    Originally posted by ericpante View Post
    Hi everybody,

    I am have similar problems with samtools 0.1.18. I would like to have reference characters listed in a pileup files, but I have problems with headers.

    samtools faidx AGSbrut.fasta
    samtools view -q 20 -buh -t AGSbrut.fasta.fai A.sam | samtools sort - A
    samtools view -q 20 -buh -t AGSbrut.fasta.fai S.sam | samtools sort - S
    samtools mpileup -B -f AGSbrut_index.fai A.bam S.bam > AS.mpileup

    [fai_build_core] different line length in sequence 'null'.
    Segmentation fault

    I hypothesized that this 'null' sequence may be a blank line; so I looked for it manually and with sed, with no luck. I also looked for other potential problems based on what was previously reported (no extra spaces, characters, etc in reference sequence names in fai and sam files). I also tried to re-head the file, with no success:

    samtools view -HS -t AGSbrut.fasta.fai A.sam > Aheader.sam
    samtools reheader Aheader.sam A.bam > Aheaded.bam

    [bam_header_read] EOF marker is absent. The input is probably truncated.

    All insights are welcome!
    thank you, eric

    Leave a comment:


  • jgibbons1
    replied
    Using pileup with the -f argument allows you to supply the faidx indexed reference sequence file. I used this option and it fixed my problem.

    Leave a comment:


  • jgibbons1
    replied
    Hey folks,

    Have been struggling to figure out why I am getting N's for my pileup reference sequence. I found hope when I discovered this string but I have followed all the suggestions to no avail. I've tried this with different versions of samtools, different data sets, different reference files and have simplified ID names, rebuilt the faidx index, etc. etc.

    Still can't figure out what's going on here. Has anyone found any other solutions?

    Thanks

    Leave a comment:


  • adowney
    replied
    Originally posted by colindaven View Post
    Here's another possible solution - the headers are not consistent between SAM/BAM and the original fasta:

    Even though the reference file was the same one in both cases, sometimes aligners just write a substring out into the SAM file. Samtools seems to take the full header.

    For example the first contiguous part of my genome header is
    gi|110645304|ref|NC_002516.2|

    However in my SAM file the aligner has only written
    NC_002516.2

    Samtools has written the full header to the .fa.fai index
    gi|110645304|ref|NC_002516.2|

    .. and this does not match.

    Solution:

    Try correcting the original header on the reference fasta to just the substring which the aligner uses.
    eg
    gi|110645304|ref|NC_002516.2|
    to
    NC_002516.2
    The above suggestion fixed the problem when I got this error

    Leave a comment:


  • ericpante
    replied
    Hi everybody,

    I am have similar problems with samtools 0.1.18. I would like to have reference characters listed in a pileup files, but I have problems with headers.

    samtools faidx AGSbrut.fasta
    samtools view -q 20 -buh -t AGSbrut.fasta.fai A.sam | samtools sort - A
    samtools view -q 20 -buh -t AGSbrut.fasta.fai S.sam | samtools sort - S
    samtools mpileup -B -f AGSbrut_index.fai A.bam S.bam > AS.mpileup

    [fai_build_core] different line length in sequence 'null'.
    Segmentation fault

    I hypothesized that this 'null' sequence may be a blank line; so I looked for it manually and with sed, with no luck. I also looked for other potential problems based on what was previously reported (no extra spaces, characters, etc in reference sequence names in fai and sam files). I also tried to re-head the file, with no success:

    samtools view -HS -t AGSbrut.fasta.fai A.sam > Aheader.sam
    samtools reheader Aheader.sam A.bam > Aheaded.bam

    [bam_header_read] EOF marker is absent. The input is probably truncated.

    All insights are welcome!
    thank you, eric

    Leave a comment:


  • smehr12
    replied
    Originally posted by SMHfrog View Post
    I had this same problem, and after seeing no solution here did some more digging, and have a possible solution for you.

    I noticed that the ref.fa.fai file for my whole genome was 0 kb. The .fai is used by samtools when building the pileup. When I ran the command to re-build the .fai:

    samtools faidx reference.fa

    I got the following error message:

    [fai_build_core] different line length in sequence 'scaffold_14'.
    Segmentation fault

    No doubt this same message occurred the first time I ran the pileup command (which also builds the .fai if it doesn't exist), but I apparently didn't pay attention. After that first time, the .fai file EXISTED so no errors were subsequently reported when I ran pileup again.

    In my case, there was an extra line after scaffold_14. I removed this, and re-built the .fai using the samtools faidx command and then re-ran the pileup command. My pileup then contained the reference base as intended!

    Hope this helps y'all find the solution to your problem.
    Best,
    Shannon
    University of Texas at Austin
    Hi all,
    I have the same error.
    samtools faidx bwa.ref/ref.fasta ref.fa

    ERROR:
    different line length in sequence 'scaffold_67'.
    Segmentation fault
    NOTE: I see NNNN in that scaffold . Does anyone have a suggestion?

    Leave a comment:


  • bgibb
    replied
    I noticed the same problem when running pileup under SAMtools-0.1.15. However the problem does not seem to occur when running pileup under SAMtools-0.1.4 (using the same reference file, same BAM file and same command line options).

    samtools-0.1.4/samtools pileup -s -f reference.fa sorted.bam > pileup.out

    Leave a comment:


  • colindaven
    replied
    Here's another possible solution - the headers are not consistent between SAM/BAM and the original fasta:

    Even though the reference file was the same one in both cases, sometimes aligners just write a substring out into the SAM file. Samtools seems to take the full header.

    For example the first contiguous part of my genome header is
    gi|110645304|ref|NC_002516.2|

    However in my SAM file the aligner has only written
    NC_002516.2

    Samtools has written the full header to the .fa.fai index
    gi|110645304|ref|NC_002516.2|

    .. and this does not match.

    Solution:

    Try correcting the original header on the reference fasta to just the substring which the aligner uses.
    eg
    gi|110645304|ref|NC_002516.2|
    to
    NC_002516.2

    Leave a comment:


  • smol
    replied
    Hi
    I'm having the same problem with Ns in my pileup file and have tried everything mentioned above (thanks for suggestions!). I am using:

    ./samtools pileup data.sorted.bam -f reference.fasta > data.pileup

    My reference .fai file looks like this:

    chr2L 49364325 7 60 61
    chr2R 61545105 50187078 60 61
    chr3L 41963435 112757942 60 61
    chr3R 53200684 155420775 60 61
    chrUNKN 42389979 209508147 60 61
    chrX 24393108 252604632 60 61
    chrY 237045 277404298 60 61

    Any ideas?

    Leave a comment:


  • brutus
    replied
    I also had this experience, in my case the problem disappeared when I removed spaces in the reference sequence name.

    Leave a comment:


  • mmartin
    replied
    I had the same problem. In my case, I had colons in the reference sequence names, something like "Region1:1-100". When I removed them, samtools pileup worked as expected.

    Leave a comment:


  • hollandorange
    replied
    I got the same problem.
    chr17 418628 N 54
    chr17 418629 N 58
    chr17 418630 N 57

    Leave a comment:


  • skingan
    replied
    It turned out to be a similar problem to the one SMHfrog had. In my reference file, each chromosome sequence was on a single line, so when samtools built the .fai file there was a segmentation fault because of the length of the sequence. I used a different reference with line breaks and it worked. I used the same reference file for the Mosaik run and the pileup build.
    Sarah

    Leave a comment:


  • thaley
    replied
    Ran into the same problem. It may be worth someone adding this to the faidx documentation regarding null strings in the reference or make the thrown error more descriptive.

    Leave a comment:


  • SMHfrog
    replied
    I had this same problem, and after seeing no solution here did some more digging, and have a possible solution for you.

    I noticed that the ref.fa.fai file for my whole genome was 0 kb. The .fai is used by samtools when building the pileup. When I ran the command to re-build the .fai:

    samtools faidx reference.fa

    I got the following error message:

    [fai_build_core] different line length in sequence 'scaffold_14'.
    Segmentation fault

    No doubt this same message occurred the first time I ran the pileup command (which also builds the .fai if it doesn't exist), but I apparently didn't pay attention. After that first time, the .fai file EXISTED so no errors were subsequently reported when I ran pileup again.

    In my case, there was an extra line after scaffold_14. I removed this, and re-built the .fai using the samtools faidx command and then re-ran the pileup command. My pileup then contained the reference base as intended!

    Hope this helps y'all find the solution to your problem.
    Best,
    Shannon
    University of Texas at Austin

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X