I have prepared a fusion genome by concatenating the human genome available here:
and a virus genome obtained by merging several individual virus genomes together. The headers of the NCBI genome are like this:
Code:
>chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 >chr2 AC:CM000664.2 gi:568336022 LN:242193529 rl:Chromosome M5:f98db672eb0993dcfdabafe2a882905c AS:GRCh38 >chr3 AC:CM000665.2 gi:568336021 LN:198295559 rl:Chromosome M5:76635a41ea913a405ded820447d067b0 AS:GRCh38 [...] >chrUn_GL000218v1 AC:GL000218.1 gi:224183305 LN:161147 rl:unplaced M5:1d708b54644c26c7e01c2dad5426d38c AS:GRCh38 >chrEBV AC:AJ507799.2 gi:86261677 LN:171823 rl:decoy M5:6743bd63b3ff2b5b8985d8933c53290a SP:Human_herpesvirus_4 tp:circular
Code:
>chrV AC:XXXXXXXX.1 gi:00000000 LN:370064105 rl:Chromosome M5:5aa5be7025d7baa666a8651e0909e4ce AS:1 SP:All_viruses tp:linear
I prepared the indices with
Code:
bwa index fusion.fa
Code:
bwa mem -t 8 -R $rd_grp $Fusion $R1 $R2 | samtools sort -o ${out}SRT.sam
$Fusion is the path to the fusion.fa reference,
$R1/2 are the fastq files of interest and
${out}SRT.sam is the alignment output.
However, I got this error:
Code:
[bns_restore_core] Parse error reading /.../fusion.fa.amb
Code:
sed -i 's/\s*$//g' fusion.fa sed -i 's/^[^>]\s*$//g' fusion.fa
Code:
printf "chrV \tAC:XXXXXXXX.1\tgi:00000000\tLN:370064105\trl:Chromosome M5:5aa5be7025d7baa666a8651e0909e4ce\tAS:1\tSP:All_viruses\ttp:linear" > fusion.fa
May I ask you if you have some hint on this error and how to sort it out.
Thank you.