Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .csfasta - Bowtie - trimming

    I've got question concerning the trimming of the primer base and the first color:

    an example of the reads (.csfasta):
    >186_2041_1641_F3
    T122233110.3012011122133012030.1110.31220022220.120
    >186_2041_1706_F3
    T11132121312201321220103230123.2113.31201112230.031
    >186_2041_1709_F3
    T2103022220322301123212223030330323320201102233.123

    According to the file description from ABI (http://www3.appliedbiosystems.com/cm...cms_058717.pdf) the file contains "UNprocessed color space data" (all first bases are T and therefore primer bases - however I still have some '.' inside). If I align the reads with Bowtie, the primer base 'T' and the first color are trimmed away - resulting in a final length of 49 bases (so one less than in color space due to the trimming). But is it really necessary to trim also the color? As far as I got it from the file description, it's not (using the ABI pipeline, the __processed color data would contain the first real base plus 49 colors and should therefore result in a final length of 50). Any ideas?

  • #2
    Remember that each color space base comprises information of 2 sequence space bases. So the first color space base of each read will be "contaminated" with the key base immediately preceding it. Probably a little tricky to align for this reason.

    --
    Phillip

    Comment


    • #3
      hm - I thought about this also. Guess the 49 colors can be mapped directly to the reference, whereas the first one that is removed would have to be 'partly mapped'. Anyway - does not matter too much - but thanks for the answer

      Comment


      • #4
        by the way: The ones that align are 48 nucleotides long (was working with this the whole time - but a quick look at a non aligned sequence twisted my mind ^^) - makes sense:
        50 colors - the first is chopped off -> makes 49 colors what represents four nucleotide strings with length of 50. Then you leave out the two nucleotides that are only covered by one color (first and last), ending up with 48 nucleotides.

        Comment


        • #5
          Bowtie accepts color space reads and takes care of last base pair of primer. So there is not need for any trimming...See the description on bowtie site..



          "Here, T is the primer base. bowtie detects and handles primer bases properly (i.e., the primer base and the adjacent color are both trimmed away prior to alignment) as long as the rest of the read is encoded as numbers"

          Comment


          • #6
            I'm sorry to not to agree. Bowtie is a great program I've been using extensively in base-space. However, it does trim an extra base at the 5' end of the read in color-space. If you convert the csfasta with corona to base space and then map with Bowtie, these reads are not wrongly trimmed. Other piece of software (SHRIMP), for instance, does not remove this extra nucleotide.

            If you're mapping overlapping fragments of transcripts or DNA, there is no problem, and the conservative approach of Bowtie seems appropriate. However, if you're mapping processed transcripts such as microRNAs, you're missing the real 5' end of the sequence. I proved this by mapping short RNA libraries to miRBase, and observing that the first nucleotide of the read was missing.

            Any help, suggestion, idea, will be useful, since I'd like to use Bowtie instead of other programs because of its great versatility.

            Cheers,
            T.

            Comment


            • #7
              New script to process color-space reads to detect microRNAs

              Hi,

              It's me again. After a while we got accepted a paper on color-space and microRNA detection. We discussed some aspects of trimming of the first reads and how to use Bowtie for an efficient mapping of short reads. An advance access version of the paper is available here:



              Best,
              Toni

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:23 AM
              0 responses
              5 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-17-2024, 06:54 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-14-2024, 07:24 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-13-2024, 08:58 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Working...
              X