Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbmap inflating read count and not finding one sequence after header

    I'm using bbmap to map transcriptome reads to a set of target loci. I'm working with 12 samples with pair-end reads and 1 sample with single-end reads, all from NCBI's SRA. I'm having no problems with bbmap reading paired-end data and completing analyses correctly. It's the one sample with single-end reads that's causing two issues:

    1. The first issue is that the input fasta file only has 288915 reads. I have confirmed this with grep ">" file.fasta | wc -l. However, bbmap reports "Reads used: 308655". I have no idea why the read count is inflated; again, this is not an issue with the paired-end data.

    2. bbmap fails to recognize a sequence immediately after the fasta header: "Warning: A fasta header with no sequence was encountered: SRR768524.9631" The sequence in question is formatted exactly like all others in the file and I have checked the EOL, which is fine. Below is what a snippet of the fasta file looks like, with the 3rd sequence being the problematic one for bbmap.

    >SRR768524.9629
    CTATCAAAGGGAAATCCCGCTGGCCTGCTATCATACAGTCTTGAACCTCCACATGCAATATCAGCTGAATCTCCAACGTGGCCGCTGACAAATGGAGTAACTACGACTGCCAAAACGAAAGCGCGACCTCCTTTCCATCCCATGGGTAATTGGAGTCTTTGAGGAAATCCACATCGGCCAGCCTCTGAATAATGGAATGGTTCCTTCTGGTTGAGTGCACTGTTTATTGAAGTGTAAAGAGACCTGAATCCTTCTTGGTCATGGATAAAGAGGGGAGAATCTGTTGAACTTCTTTCCCACACATTAACACCCTCAGTCAATTTTACTGGGAAACGGTCAATTTCAAATAAGCTAGTTTTGCTTGGTCATAAGTGAGGAAATGTTCATCAGAATCATATTTCGGTCCAATGAATACTCTCACAATGGCATCATCAGCTTTT
    >SRR768524.9630
    ATAATGCAATTATAGATTGTTGGAGTGCAGGTAAAGCTACCACTGTTATGATTAAAGATAATCCAAAGGTTGAAATTCTTGATGTAGAAGATGTTAAGGTTGGAAAGATAAGACAATTTTGTGAGTTGGACTTGGCATTGAACATGGCCTTACGAAAGTATTTTGGTAGTGTGTTTGATAAAATGGCAGTTACATCTAATGAAACGCCGTGGAAAGTTGCTTGGAATCCATATTTTATGCCTCATCACATCGTGGCGATAGAGAACGACAAGTACGATGTCTTTTGTATAGATGTGAAAAGAATGGATAAGAATTTACCAGTCCAATTCACTGAGATATTGTGT
    >SRR768524.9631
    TGTTACTGGGTAGGGCTGTGGCACTGGGACCTTGACTGGATAGGGTACATGCTTCTCGACTGGTACTGGGTATGGCTTGGGAATGTGGACAGGGTATGGCCTATCGACTGGGACCTTCACGGGATAGGGAACTTTCTTCTCGATATGGACTGGATAAGGTACTGGTACCTCCTTGTGGATGGTGATAGCTTTAATGTGGCCGTGTTCCTCATGTCCTCCGAGTTCATAGCCACCTCCATATCCGTATCCTCCTAAGTCATGTCCACCTCCATATCCTCCATGTCCACCGCCGTATCCTCCAAGCTCATAACCACCTCCATATCCTAAGCCAAGAAGTCCTCGTTTCTCCTGCTTCTTGTCATCTGTTGGTGCCGCTGCTTTGTCGGTCTTGGATTCGGCCTTCTTCTCTTCAGCTGATGCTGTGGCAAGCAGTGCCAACAGCCCTACCCACAGTACCTTGGATTGCATTGTTGAGTCGTGGTGTGGTCGGCGTCTCCCAA
    >SRR768524.9632
    AATTCCCAACGACCAAGTATCTGAACATGAGTGGATCAATGCTGAGATCCTGCCTGCTACTGCTATTCCTAGCTTACTGTGTGTCCTGCTATAGAAGTCATGTTCCTAGAGGCGGGAGTTACTCTCTACCGCCTGGAGTTAATCCAACATTCCCAGGAAGGAACCAAGGACTGCCTCCGGCTTATCATGGAAAATTCAAGAGATCACTGGAAGGAGGTTTAGAACCTGAAGATGGTGGTGTCCTTGCAGTTGATGAACCTGCTGATTATCTGAAAGTCAAAAGGTCAGTGGAAGATGTTGAAGGTGAATTCCTTGTGAACGAAGAACCTCAAGAATTTGAGACACTGAGAGCGCGCCGTGACGTCAGAATAATTCATCCAACT
    >SRR768524.9633
    TTCCAAACTGTCGATTCATGATGTACACAATACCAAAAAAGGCAAATAAGAAATAAAAGT
    I'm at a loss for figuring out how to resolve these issues. I appreciate any help in getting this to run properly on this last fasta file.

  • #2
    Is there a specific reason you converted these reads to fasta format? If this is data from SRA then you should be able to map the fastq reads directly.

    You may be getting secondary alignments and that may be the reason why your read count seems inflated.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Best Practices for Single-Cell Sequencing Analysis
      by seqadmin



      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      06-06-2024, 07:15 AM
    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 07:24 AM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 06-13-2024, 08:58 AM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 06-12-2024, 02:20 PM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 06-07-2024, 06:58 AM
    0 responses
    184 views
    0 likes
    Last Post seqadmin  
    Working...
    X