Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .csfasta - Bowtie - trimming

    I've got question concerning the trimming of the primer base and the first color:

    an example of the reads (.csfasta):
    >186_2041_1641_F3
    T122233110.3012011122133012030.1110.31220022220.120
    >186_2041_1706_F3
    T11132121312201321220103230123.2113.31201112230.031
    >186_2041_1709_F3
    T2103022220322301123212223030330323320201102233.123

    According to the file description from ABI (http://www3.appliedbiosystems.com/cm...cms_058717.pdf) the file contains "UNprocessed color space data" (all first bases are T and therefore primer bases - however I still have some '.' inside). If I align the reads with Bowtie, the primer base 'T' and the first color are trimmed away - resulting in a final length of 49 bases (so one less than in color space due to the trimming). But is it really necessary to trim also the color? As far as I got it from the file description, it's not (using the ABI pipeline, the __processed color data would contain the first real base plus 49 colors and should therefore result in a final length of 50). Any ideas?

  • #2
    Remember that each color space base comprises information of 2 sequence space bases. So the first color space base of each read will be "contaminated" with the key base immediately preceding it. Probably a little tricky to align for this reason.

    --
    Phillip

    Comment


    • #3
      hm - I thought about this also. Guess the 49 colors can be mapped directly to the reference, whereas the first one that is removed would have to be 'partly mapped'. Anyway - does not matter too much - but thanks for the answer

      Comment


      • #4
        by the way: The ones that align are 48 nucleotides long (was working with this the whole time - but a quick look at a non aligned sequence twisted my mind ^^) - makes sense:
        50 colors - the first is chopped off -> makes 49 colors what represents four nucleotide strings with length of 50. Then you leave out the two nucleotides that are only covered by one color (first and last), ending up with 48 nucleotides.

        Comment


        • #5
          Bowtie accepts color space reads and takes care of last base pair of primer. So there is not need for any trimming...See the description on bowtie site..



          "Here, T is the primer base. bowtie detects and handles primer bases properly (i.e., the primer base and the adjacent color are both trimmed away prior to alignment) as long as the rest of the read is encoded as numbers"

          Comment


          • #6
            I'm sorry to not to agree. Bowtie is a great program I've been using extensively in base-space. However, it does trim an extra base at the 5' end of the read in color-space. If you convert the csfasta with corona to base space and then map with Bowtie, these reads are not wrongly trimmed. Other piece of software (SHRIMP), for instance, does not remove this extra nucleotide.

            If you're mapping overlapping fragments of transcripts or DNA, there is no problem, and the conservative approach of Bowtie seems appropriate. However, if you're mapping processed transcripts such as microRNAs, you're missing the real 5' end of the sequence. I proved this by mapping short RNA libraries to miRBase, and observing that the first nucleotide of the read was missing.

            Any help, suggestion, idea, will be useful, since I'd like to use Bowtie instead of other programs because of its great versatility.

            Cheers,
            T.

            Comment


            • #7
              New script to process color-space reads to detect microRNAs

              Hi,

              It's me again. After a while we got accepted a paper on color-space and microRNA detection. We discussed some aspects of trimming of the first reads and how to use Bowtie for an efficient mapping of short reads. An advance access version of the paper is available here:



              Best,
              Toni

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X