Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failure in running BWA on human fusion genome

    Dear all,
    I have prepared a fusion genome by concatenating the human genome available here:

    and a virus genome obtained by merging several individual virus genomes together. The headers of the NCBI genome are like this:

    Code:
    >chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
    >chr2  AC:CM000664.2  gi:568336022  LN:242193529  rl:Chromosome  M5:f98db672eb0993dcfdabafe2a882905c  AS:GRCh38
    >chr3  AC:CM000665.2  gi:568336021  LN:198295559  rl:Chromosome  M5:76635a41ea913a405ded820447d067b0  AS:GRCh38
    [...]
    
    >chrUn_GL000218v1  AC:GL000218.1  gi:224183305  LN:161147  rl:unplaced  M5:1d708b54644c26c7e01c2dad5426d38c  AS:GRCh38
    >chrEBV  AC:AJ507799.2  gi:86261677  LN:171823  rl:decoy  M5:6743bd63b3ff2b5b8985d8933c53290a  SP:Human_herpesvirus_4  tp:circular
    Thus I prepared the header of my sequence as:

    Code:
    >chrV  AC:XXXXXXXX.1  gi:00000000  LN:370064105  rl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce  AS:1  SP:All_viruses  tp:linear
    before concatenating with the NCBI's.

    I prepared the indices with
    Code:
    bwa index fusion.fa
    and then ran the alignment with
    Code:
    bwa mem -t 8 -R $rd_grp $Fusion $R1 $R2 | samtools sort -o ${out}SRT.sam
    where $rd is the read group header,
    $Fusion is the path to the fusion.fa reference,
    $R1/2 are the fastq files of interest and
    ${out}SRT.sam is the alignment output.

    However, I got this error:
    Code:
    [bns_restore_core] Parse error reading /.../fusion.fa.amb
    On internet I found that the error might be due to spaces in the sequence, so I applied:

    Code:
    sed -i 's/\s*$//g' fusion.fa
    sed -i 's/^[^>]\s*$//g' fusion.fa
    and also
    Code:
    printf "chrV \tAC:XXXXXXXX.1\tgi:00000000\tLN:370064105\trl:Chromosome   M5:5aa5be7025d7baa666a8651e0909e4ce\tAS:1\tSP:All_viruses\ttp:linear" > fusion.fa
    on the idea that the fields cuold be tab delimited. The error persisted with each version.

    May I ask you if you have some hint on this error and how to sort it out.
    Thank you.

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
201 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
210 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-02-2024, 03:00 PM
0 responses
192 views
0 likes
Last Post seqadmin  
Working...
X