Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shis
    Member
    • Apr 2014
    • 16

    Trim barcode off

    Hi,
    I have a sample ID file based on barcode as below and I would like to trim barcode off from this file. The sequences are from Illumina using ddRADseq method.

    @FCC1LPDACXX:1:1101:1478:2239#GTNNNNTT/1
    TGACGCCATGCAGGCGATGAATGTGGAATATGATGAATCTTTCCTGGAGTGGCTTGAAATAATATTGCAGAATGCCTCTGAATACTGGCCTGCTCTTATTCATACGCGCGGTTTTTCCCGTACAACCCTATGGCAGTGCAACCAGCAGTGCAATCATGTCATTAGCTCATCAGTTTAGAATAGATGTCCAAAAAGGATAT
    +
    bbbeeeeegggegiiifiihiiiggihdffhidfihhh[cgffhfghfheghhhhYG__\bdd_\db`ggd_^_VZ]_bYZ``]_Z`caaY[TY]KYTZ`^a_eeeeegOO[bfhhhgihefhiighhihihiiihgihiiiiiggfgeeeeeeddddcdddddcccccccccccddccbbcbcdcc`bcccaccccbcb


    Can anyone suggest me how can I trimmed barcode off? Thanks
  • blancha
    Senior Member
    • May 2013
    • 367

    #2
    Trimmomatic?

    Comment

    • Michael.Ante
      Senior Member
      • Oct 2011
      • 127

      #3
      It seems, you used an external barcode; therefore it is not part of your sequence but part of the ID (GTNNNNTT).
      AFAIK you don't run into any trouble, while having the barcode in the read ID.
      If you still want to get rid of it, use awk to trim every fourth line.

      Comment

      • blancha
        Senior Member
        • May 2013
        • 367

        #4
        Yes, Michael.Ante is right. My answer is wrong.
        Trimmomatic is generally used to remove the adapter sequences within the read sequence.
        In this case, the barcode was sequenced separately, and appears in the ID.
        I can't think of a good reason to want to remove it from the ID, but awk could be used to remove it, as Michael.Ante suggested.
        Last edited by blancha; 04-30-2014, 09:00 AM.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          I hope that barcode (GTNNNNTT) represents some form of masking because if it truly has 4 N's then the sequence must look pretty ugly.

          Comment

          • blancha
            Senior Member
            • May 2013
            • 367

            #6
            I was thinking the same thing.
            It is very odd. Half the bases in the barcode are Ns, yet there are no Ns in the sequence read below.
            It could be a form of masking, as you said, but I don't know what would be the point of the masking.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              I assume 'N' in the bar code indicates a wildcard; in other words, all barcodes that start with GT and end with TT would be grouped together.

              Comment

              • shis
                Member
                • Apr 2014
                • 16

                #8
                Originally posted by Michael.Ante View Post
                It seems, you used an external barcode; therefore it is not part of your sequence but part of the ID (GTNNNNTT).
                AFAIK you don't run into any trouble, while having the barcode in the read ID.
                If you still want to get rid of it, use awk to trim every fourth line.
                Thanks Michael.Ante. But, how do I know that the barcode is not present in the sequence?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Originally posted by shis View Post
                  Thanks Michael.Ante. But, how do I know that the barcode is not present in the sequence?
                  I suppose you are referring to adapters (and not barcodes)? In illumina technology, barcode/tag reads are read independently and are never part of the actual sequence read.

                  Comment

                  • mastal
                    Senior Member
                    • Mar 2009
                    • 666

                    #10
                    If it's RAD-Seq data, it could well have both adapters and barcodes in the reads.

                    Has the data already been demultiplexed to separate the reads into different files by barcode?

                    A barcode with several Ns in it suggests that the Illumina index read did not go very well, and you can't really assign that read to a particular barcode.

                    I'm not sure about ddRAD-Seq, but in RAD-Seq data you expect to see an MID (multiplex identifier) and the restriction enzyme site at the start of the read.

                    At the 5' end of the read, bases 9-14, 'TGCAGG', could be the restriction site for Sbf1, one of the enzymes often used in RAD-Seq.
                    Last edited by mastal; 04-30-2014, 12:22 PM.

                    Comment

                    • shis
                      Member
                      • Apr 2014
                      • 16

                      #11
                      Originally posted by mastal View Post
                      If it's RAD-Seq data, it could well have both adapters and barcodes in the reads.

                      Has the data already been demultiplexed to separate the reads into different files by barcode?

                      A barcode with several Ns in it suggests that the Illumina index read did not go very well, and you can't really assign that read to a particular barcode.

                      I'm not sure about ddRAD-Seq, but in RAD-Seq data you expect to see an MID (multiplex identifier) and the restriction enzyme site at the start of the read.

                      At the 5' end of the read, bases 9-14, 'TGCAGG', could be the restriction site for Sbf1, one of the enzymes often used in RAD-Seq.
                      Yes, the data has already been demultiplexed into samples ID files based on barcode.

                      Comment

                      • Michael.Ante
                        Senior Member
                        • Oct 2011
                        • 127

                        #12
                        Yes, the data has already been demultiplexed into samples ID files based on barcode.
                        In this case, the barcode is almost never appearing in the read-sequence.

                        Thanks Michael.Ante. But, how do I know that the barcode is not present in the sequence?
                        Just make an FastQC report from the demultiplexed libraries. You can check there the "per base sequence content". If you still have a barcode present (e.g. GTNNNNTT),you would observe this sequence at the reads' start:
                        Pos 1 a 'G', pos2, pos 7 & pos 8 a 'T'.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Today, 05:37 AM
                        0 responses
                        5 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-26-2026, 11:10 AM
                        0 responses
                        16 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        109 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...