Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • axiom7
    Member
    • Aug 2009
    • 14

    Samtools "is recognized as '*'" "truncated file" error

    Hi,

    I posted this last week to the samtools thread, but did not receive a reply, so I'm taking another stab at it:

    samtools-0.1.6_x86_64-linux; precompiled version downloaded today

    $ bowtie --version
    bowtie version 0.11.3
    64-bit
    Built on myserver
    Fri Oct 23 13:27:05 MDT 2009
    Compiler: gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)
    Options: -O3
    Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

    $ samtools faidx hs_ref_chr10.fa
    $ cat hs_ref_chr10.fa.fai
    gi|89161187|ref|NC_000010.9|NC_000010 135374737 105 70 71

    Created sam format with bowtie in two ways (behavior described below is the same for both methods):
    1. using -S in bowtie command
    2. using samtools bowtie2sam.pl on a bowtie --refout map file

    sam file from method 2 (chromosome 10 only):

    $ cat ref00000.map.sam
    @0-0-3-9833 0 gi|89161187|ref|NC_000010.9|NC_000010 62380535 0 15M * 0 0 GCAAAGGNNATCATT IIIIIIIIIIIIIII NM:i:1 X1:i:5 MD:Z:3A3N0N6
    @0-0-3-9833 16 gi|89161187|ref|NC_000010.9|NC_000010 62382480 0 15M * 0 0 GGGCTANNGCTCATC IIIIIIIIIIIIIII NM:i:1 X1:i:5 MD:Z:7N0N6
    @0-0-6-12817 0 gi|89161187|ref|NC_000010.9|NC_000010 6095909 0 15M * 0 0 TACCACCNNGCCCTT IIIIIIIIIIIIIII NM:i:1 X1:i:265 MD:Z:1A5N0N6
    @0-0-6-12817 16 gi|89161187|ref|NC_000010.9|NC_000010 6097174 0 15M * 0 0 GCATCANNCTCCCGA IIIIIIIIIIIIIII NM:i:1 X1:i:265 MD:Z:7N0N6


    $ samtools view -bt ~/work/hs_ref_chr/hs_ref_chr10.fa.fai -o out.bam ref00000.map.sam
    [sam_header_read2] 1 sequences loaded.
    [sam_read1] reference '16 gi|89161187|ref|NC_000010.9|NC_000010 6097174 0 15M * 0 0 GCATCANNCTCCCGA IIIIIIIIIIIIIII NM:i:1 X1:i:265MD:Z:7N0N6

    ' is recognized as '*'.
    [main_samview] truncated file.

    out.bam is created, but I cannot do anything further with it.
  • mdjones66
    Junior Member
    • Jul 2008
    • 3

    #2
    Yeah this is driving me nuts too. Seems google only know that the question has been asked but it doesn't now the answer.

    Comment

    • thh32
      Member
      • Feb 2014
      • 60

      #3
      Same issue here, ever find out a solution?

      Comment

      • A.N.Other
        Member
        • Feb 2012
        • 26

        #4
        Pretty sure the OP has got past this by now, but for thh32, it looks to me like it's an issue caused by a strange read ID that contains the '@' symbol at the start.

        '@' at the start of a line indicates a comment (header) line in SAM, so it's interpreting your actual reads as part of the header, which then falls over because they aren't in the right format.

        The header should look something like ...

        Code:
        @HD	VN:1.0	SO:unsorted
        @SQ	SN:chr1	LN:195471971
        @SQ	SN:chr1_GL456210_random	LN:169725
        ...etc...
        ... for example, and the reads should then NOT start with an @ ...

        Code:
        HWI-ST539:109:D14VPACXX:1:1303:19984:9383	65	chr4	147868767	60	99M	=	147868771	0	GTTGGTCAGTAGTACTCGGTTACGCAATTTCCGGATGTAAAGTCTCTAATGGCAGTGGATAGGTGGGGCTAGAGACTCCGGCAACTTTGACCTTTTCAC	??CCC@44((3@8+88BCB@?;8>>6;='/86?EEE@HEGGHC@D@)A>GIGGFGHEGF@GHGDIGCHFF9G>BFIIIHCGF<@GEHFA<HDDDDD@@@	AS:i:495	NM:i:0	XI:f:1	X0:i:1	X1:i:0	XE:i:29	XR:i:99MD:Z:99
        What do the read IDs look like in the fastq file you're aligning?

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        24 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        41 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        48 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Working...