Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gigiux
    started a topic Failure in running BWA on human fusion genome

    Failure in running BWA on human fusion genome

    Dear all,
    I have prepared a fusion genome by concatenating the human genome available here:

    and a virus genome obtained by merging several individual virus genomes together. The headers of the NCBI genome are like this:

    Code:
    >chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
    >chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
    >chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
    [...]
    
    >chrUn_GL000218v1  AC:GL000218.1  gi:224183305  LN:161147  rl:unplaced  M5:1d708b54644c26c7e01c2dad5426d38c  AS:GRCh38
    >chrEBV  AC:AJ507799.2  gi:86261677  LN:171823  rl:decoy  M5:6743bd63b3ff2b5b8985d8933c53290a  SP:Human_herpesvirus_4  tp:circular
    Thus I prepared the header of my sequence as:

    Code:
    >chrV  AC:XXXXXXXX.1  gi:00000000  LN:370064105  rl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce  AS:1  SP:All_viruses  tp:linear
    before concatenating with the NCBI's.

    I prepared the indices with
    Code:
    bwa index fusion.fa
    and then ran the alignment with
    Code:
    bwa mem -t 8 -R $rd_grp $Fusion $R1 $R2 | samtools sort -o ${out}SRT.sam
    where $rd is the read group header,
    $Fusion is the path to the fusion.fa reference,
    $R1/2 are the fastq files of interest and
    ${out}SRT.sam is the alignment output.

    However, I got this error:
    Code:
    [bns_restore_core] Parse error reading /.../fusion.fa.amb
    On internet I found that the error might be due to spaces in the sequence, so I applied:

    Code:
    sed -i 's/\s*$//g' fusion.fa
    sed -i 's/^[^>]\s*$//g' fusion.fa
    and also
    Code:
    printf "chrV \tAC:XXXXXXXX.1\tgi:00000000\tLN:370064105\trl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce\tAS:1\tSP:All_viruses\ttp:linear" > fusion.fa
    on the idea that the fields cuold be tab delimited. The error persisted with each version.

    May I ask you if you have some hint on this error and how to sort it out.
    Thank you.

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:52 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-12-2024, 03:37 PM
0 responses
436 views
0 likes
Last Post seqadmin  
Working...
X