Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSNAP gives Bus Error: 10

    Hello everypony!

    I am using GSNAP to map my RNA-seq paired-end reads to a reference genome. It used to run normally (a few months ago), but I needed to remap some stuff using the exact same command line as before and now GSNAP decided not to work anymore.
    It starts the alignment normally and then after a short while gives out Bus error:10.
    This is how it looks like:

    Code:
    gsnap -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1  /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 > log13r1.txt 2 > err13r1.txt
    GSNAP version 2014-02-28 called with args: gsnap -D /Volumes/Temp/Anna/reference/ -d oregonR_reference --quality-protocol illumina -N 1 -s /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit -t 20 -A sam --split-output dmel_oregonR_t13_rep1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_1 /Volumes/Temp/Anna/reads/trimmed_reads/oregon/dmel_oregonR_t13_rep1_2 1 2
    Checking compiler assumptions for popcnt: 000041A7 clz=17 clz=0 popcount=7 
    Checking compiler assumptions for SSE2: 000041A7 10D63AF1 xor=10D67B56
    Checking compiler assumptions for SSE4.1: -89 -15 max=241
    Novel splicing (-N) and known splicing (-s) both turned on => assume reads are RNA-Seq
    Note: >1 sequence detected, so index files are being memory mapped.
      GSNAP can run slowly at first while the computer starts to accumulate
      pages from the hard disk into its cache.  To copy index files into RAM
      instead of memory mapping, use -B 3, -B 4, or -B 5, if you have enough RAM.
      For more speed, also try multiple threads (-t <int>), if you have multiple processors or cores.
    Pre-loading compressed genome (oligos).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.17 sec)
    Pre-loading compressed genome (bits).....,...,...,...,...,...,...,...,..done (63,276,204 bytes, 15449 pages, 0.16 sec)
    Pre-loading suffix array...............................................................................................................................,............................................................................................................................................done (674,946,152 bytes)
    Looking for index files in directory /Volumes/Temp/Anna/reference//oregonR_reference
      Pointers file is oregonR_reference.ref12153bitpackptrs
      Offsets file is oregonR_reference.ref12153bitpackcomp
      Positions file is oregonR_reference.ref153positions
    Offsets compression type: bitpack
    Allocating memory for ref offset pointers, kmer 15, interval 3...done (134,217,736 bytes, 1.45 sec)
    Allocating memory for ref offsets, kmer 15, interval 3...done (226,957,088 bytes, 2.48 sec)
    Pre-loading ref positions, kmer 15, interval 3........................................................................................done (215,791,212 bytes, 52684 pages, 0.60 sec)
    Reading splicing file /Volumes/Temp/Anna/reference/oregonR_reference/oregonR_reference.maps/dmel-all-transcript-r5.49-Parent1.iit locally...found donor and acceptor tags, so treating as splicesites file
    splice distances present...37770 unique splicesites...
    Non-standard nucleotide N near splice site YHet_Parent1:291284.  Discarding...
    37769 splicesites are valid...splicetrie_obs has 37773 entries...splicetrie_max has 3858412 entries...done
    GMAP modes: pairsearch, indel_knownsplice, terminal, improvement
    Starting alignment
    Bus error: 10
    Does anybody know what could be wrong this time and how to fix it?

    Thanks in advance!

    Ana Marija

  • #2
    So for those of you who ever come across this type of very uninformative error message here is how I have found the cause of it:

    Since all my fastq files but one gave no error messages after mapping except for one fastq file I went on to a binary search through my problematic fastq file to find the problem because I assumed the problem is not in the mapper and all the standard fastq checks gave no clue of what was wrong.
    So, the way I did this "binary search" is I had split my file(s) in half, reran mapping on both halves and whichever half gave an error, I split it again and redo the procedure until finally I got only two reads in my final fastq file.
    After 24 iterations, I got a tiny fastq file (which was still giving me the Bus error: 10) containing 2 reads, one of which looked normal, and another which looked like a microsatellite read.
    So I took the microsat read, remapped it by itself, and this time it gave a different error:
    Code:
    Paired-end accessions FCD20FCACXX:2:1302:15509:87068#ATCACGAT/2 and FCD20FCACXX:2:1302:15509:87068#ATCACGAT/1 do not match
    When I remapped the other "normal" read, it mapped normally, with no errors.

    So obviously, the microsat read was the one causing the problem.
    I tried remapping it again but after removing the first nucleotide in one of the pair reads and it's quality so I made both read sequences complementary again. After doing this, the mapping worked perfectly, with no errors.

    So there is a weird issue in GSNAP-2014-02-28 with complementarity of microsat paired reads.

    What is the reason for it and why GSNAP gives two different error messages if the reads are mapped with other reads or by themselves, I have no idea.
    But at least this could be a hint for someone else out there who has the same problem I had.

    To half my fastq I just used
    Code:
    split -l n dmel_oregonR_t13_rep1_1 splitrep1_1 
    split -l n dmel_oregonR_t13_rep1_2 splitrep1_2
    #the output is 2 files with aa and ab extension: splitrep1_1aa & splitrep1_1ab
    where n is the number of lines of the fastq file divided by 2.

    And that's it!

    Cheers,

    Ana Marija

    Comment

    Latest Articles

    Collapse

    • seqadmin
      The Impact of AI in Genomic Medicine
      by seqadmin



      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
      02-26-2024, 02:07 PM
    • seqadmin
      Multiomics Techniques Advancing Disease Research
      by seqadmin


      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

      A major leap in the field has
      ...
      02-08-2024, 06:33 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 02-28-2024, 06:12 AM
    0 responses
    21 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-23-2024, 04:11 PM
    0 responses
    69 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-21-2024, 08:52 AM
    0 responses
    77 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-20-2024, 08:57 AM
    0 responses
    67 views
    0 likes
    Last Post seqadmin  
    Working...
    X