Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gigiux
    started a topic Failure in running BWA on human fusion genome

    Failure in running BWA on human fusion genome

    Dear all,
    I have prepared a fusion genome by concatenating the human genome available here:

    and a virus genome obtained by merging several individual virus genomes together. The headers of the NCBI genome are like this:

    Code:
    >chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
    >chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
    >chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
    [...]
    
    >chrUn_GL000218v1  AC:GL000218.1  gi:224183305  LN:161147  rl:unplaced  M5:1d708b54644c26c7e01c2dad5426d38c  AS:GRCh38
    >chrEBV  AC:AJ507799.2  gi:86261677  LN:171823  rl:decoy  M5:6743bd63b3ff2b5b8985d8933c53290a  SP:Human_herpesvirus_4  tp:circular
    Thus I prepared the header of my sequence as:

    Code:
    >chrV  AC:XXXXXXXX.1  gi:00000000  LN:370064105  rl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce  AS:1  SP:All_viruses  tp:linear
    before concatenating with the NCBI's.

    I prepared the indices with
    Code:
    bwa index fusion.fa
    and then ran the alignment with
    Code:
    bwa mem -t 8 -R $rd_grp $Fusion $R1 $R2 | samtools sort -o ${out}SRT.sam
    where $rd is the read group header,
    $Fusion is the path to the fusion.fa reference,
    $R1/2 are the fastq files of interest and
    ${out}SRT.sam is the alignment output.

    However, I got this error:
    Code:
    [bns_restore_core] Parse error reading /.../fusion.fa.amb
    On internet I found that the error might be due to spaces in the sequence, so I applied:

    Code:
    sed -i 's/\s*$//g' fusion.fa
    sed -i 's/^[^>]\s*$//g' fusion.fa
    and also
    Code:
    printf "chrV \tAC:XXXXXXXX.1\tgi:00000000\tLN:370064105\trl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce\tAS:1\tSP:All_viruses\ttp:linear" > fusion.fa
    on the idea that the fields cuold be tab delimited. The error persisted with each version.

    May I ask you if you have some hint on this error and how to sort it out.
    Thank you.

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Working...
X