Header Leaderboard Ad


Too many mismatches?



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Too many mismatches?

    Hello guys,

    I've just been hit with my first SOLiD data...
    Reading the posts here, I already feel better to see that other people are struggling as well

    I'm trying to map the reads (75bp) to prokaryotic reference genomes and detect SNPs. Because I couldn't get any color-space aligners to work I've converted to base-space and used Bowtie2 for alignment. I'm getting on average about 10 mismatches per read. Some have as low as 2 mismatches, but others have above 20. Because I was using ECC chemistry I did not think this would turn out so bad...

    My question is this: does it even make sense to try and detect SNPs if I have that many mismatches in my reads? Should I rather focus on getting the alignment to work in color-space?

    thanks for your help

  • #2
    I would focus on getting the color aligners to work. I would just use lifescope because it knows what to do with the ECC data and I think it has a few scripts for converting reference genomes to color space.

    If you just convert reads from color to space the ECC is useless.


    • #3
      Originally posted by BambooGarden View Post
      Because I couldn't get any color-space aligners to work I've converted to base-space and used Bowtie2 for alignment.
      A very hesitant +1 for lifescope, because they know the most about colour-space.

      Are you aware that Bowtie (v1) can do colour-space alignment and has very similar input/output parameters to Bowtie2?

      How are you converting to base-space? If you're doing a naive conversion in the absence of a reference sequence (e.g. G1122330 = GTGAGCGG, regardless of error), then you're going to end up with plenty of rubbish sequence every time there's a colour error. At the risk of repeating myself too much, colour-space is not an intuitive way of representing sequence, and you'll save yourself a lot of pain and time by shifting to a different sequencing platform.


      • #4
        I am using NGS plumbing to convert to base-space. I guess this is what you call a naive conversion because I'm not putting any reference sequence in at that point. What would be a software to convert with taking a reference sequence into account?

        Yeah, I agree. Definitely next time another platform. But for now I'll have to make do with this data somehow.

        Thanks for the help.


        • #5
          Originally posted by BambooGarden View Post
          What would be a software to convert with taking a reference sequence into account?
          Bowtie can do this, you just have to map the reads to your reference first (which is a bit of a chicken/egg thing). The base-space sequence reported by bowtie is corrected to match the reference sequence (but including any discovered SNPs).


          • #6
            Older versions of BWA worked with "SOLiD".
            Colorspace was disabled in 0.6.1, I don't know if it was re-enabled.
            As I remember, it required using solid2fastq.pl program.
            The "bioscope" aligner was too aggressive in aligning reads; BWA did a better job of dropping and clipping reads that had mis-transitions in the middle of the reads.

            The newer "lifescope" (?) software may have improved the situation.

            I'd recommend getting and old copy of BWA and using it.


            • #7
              We did do some testing with comparing bioscope, lifescope, bwa, bowite and shimp for color space alignments and found that Shrimp2 worked the best. Although all of these tests were done before my arrival so I don't have all of the details, but generally Shrimp2 seems to work well and maps in color space.


              Latest Articles


              • seqadmin
                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                by seqadmin

                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                01-24-2023, 01:19 PM
              • seqadmin
                Introduction to Single-Cell Sequencing
                by seqadmin
                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                01-09-2023, 03:10 PM