No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failure in running BWA on human fusion genome

    Dear all,

    I have used the fasta genome provided by NCBI. The headers of this file are:

    >chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38
    >chr2 AC:CM000664.2 gi:568336022 LN:242193529 rl:Chromosome M5:f98db672eb0993dcfdabafe2a882905c AS:GRCh38
    >chr3 AC:CM000665.2 gi:568336021 LN:198295559 rl:Chromosome M5:76635a41ea913a405ded820447d067b0 AS:GRCh38
    >chrUn_GL000218v1 AC:GL000218.1 gi:224183305 LN:161147 rl:unplaced M5:1d708b54644c26c7e01c2dad5426d38c AS:GRCh38
    >chrEBV AC:AJ507799.2 gi:86261677 LN:171823 rl:decoy M5:6743bd63b3ff2b5b8985d8933c53290a SP:Human_herpesvirus_4 tp:circular

    I need to use a fusion genome built by concatenating this human genome with one obtained from selected virus sequences. This virus genome is formed by a single header and a long stretch of nucleotides derived from individual virus sequences.

    I prepared the header for the virus genome as follows:

    >chrV AC:XXXXXXXX.1 gi:00000000 LN:370064105 rl:Chromosome M5:5aa5be7025d7baa666a8651e0909e4ce AS:1 SP:All_viruses tp:linear

    I made up accession number AC to XXXXXXXX.1 because there is no real entry for my made-up genome in Genbank & NCBI; since the IDs given in the human genome are 8 digit long, I gave a 8 letters fake entry and a ".1" because this is first time I am using this genome (maybe I should have used two letter, 6 numbers?).

    Same for the GI number: the made up genome is not recorded in GenBank, thus I simply gave a fake 8 digit number.

    LN is the length of the genome, I treated it as a real chromosome and M5 derives for md5sum I made on the fasta file. AS and SP are free text fields (I assumed) and the genome is linear.

    I separated the fields with two spaces.

    I concatenated the human genome and the made up virus genome with `cat <human.fa> <virus.fa> > <fusion.fa> and I prepared the indices for this genome and aligned the samples with

    bwa index -a bwtsw <fusion.fa>
    bwa mem -t 8 -R <read_group> <fusion.fa> <R1.fq.gz> <R2.fq.gz> | \
    samtools sort -o <file_ALN-SRT.sam>
    However, I got this error message:

    [bns_restore_core] Parse error reading <fusion.fa>.amb
    and the SAM file is virtually empty:

    cat <file_ALN-SRT.sam>
    @HD VN:1.3 SO:coordinate
    May I ask what I got wrong? When aligning against either one or the other genome separately the alignment is OK, thus it must be a problem with the headers I guess.

    I tried with both sed -i 's/\s*$//g' (for spaces in the sequence) and sed -i 's/^[^>]\s*$//g' (for spaces in the header) followed by bwa index but the result was always the same.

    Any clues?

    Thank you

Latest Articles


  • seqadmin
    Advanced Methods for the Detection of Infectious Disease
    by seqadmin

    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
    11-27-2023, 01:15 PM
  • seqadmin
    Strategies for Investigating the Microbiome
    by seqadmin

    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
    11-09-2023, 07:02 AM





Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:23 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 12-01-2023, 09:55 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 11-30-2023, 10:48 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 11-29-2023, 08:26 AM
0 responses
Last Post seqadmin