Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to remove first 15 bases of a sequence using unix or qiime

    So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning, and I'd like to remove it, so then my downstream analysis will play nicely within QIIME.

    I've tried changing my mapping file by adding 15bp to each barcode of (NNNNNNNNNNNNNNN), but QIIME says thats not a cool thing to do.

    I'm interested in other ways to do this, and again for the .qual file.

  • #2
    Hi,

    I am not entirely sure what you want to achieve. But is sounds like BBduk or BB reformat could easily do it (http://seqanswers.com/forums/showthread.php?t=58221 ) as could do some simple scripts.

    Comment


    • #3
      Can you post an example 2-3 sequences from the fasta and qual files? BBduk from BBMap can manage the fasta file trim (forcetrimleft=15) for sure but I am not certain about the qual file. You can try it yourself.

      We can always do an awk/perl solution if this does not work.
      Last edited by GenoMax; 05-13-2015, 02:37 PM. Reason: Changed parameter based on Brian's explanation

      Comment


      • #4
        You can do that with BBDuk:

        bbduk.sh in=reads.fq out=trimmed.fq ftl=15

        If you have fasta + qual files:

        bbduk.sh in=reads.fa qfin=reads.qual out=trimmed.fa qfout=trimmed.qual ftl=15

        If not all of the reads contain that, and you only want to trim the reads that have a specific sequence, then use the flags "ktrim=l literal=X k=15" instead, where X is the sequence you want to trim. If you have lots of sequences, you can separate them with commas.

        Edit - double-ninja'd!

        @GenoMax. Sorry the numbering is a little confusing for force-trim; these are the definitions:
        Code:
        	/** Trim left bases of the read to this position (exclusive, 0-based) */
        	private final int forceTrimLeft;
        	/** Trim right bases of the read after this position (exclusive, 0-based) */
        	private final int forceTrimRight;
        	/** Trim this many rightmost bases of the read */
        	private final int forceTrimRight2;
        So to make a 100bp read 85 bp long by trimming the right end, you would say "ftr=84" or "ftr2=15"; to do so by trimming the left side, it would be "ftl=15".
        Last edited by Brian Bushnell; 05-13-2015, 02:34 PM.

        Comment


        • #5
          Brian: Shouldn't that be ftl=14 (0-based correct)?

          Comment


          • #6
            Hi Geno-

            Here are some sample lines of the .fna file, its pretty standard.

            >SRR1502445.1
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
            >SRR1502445.2
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.3
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.4
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN

            Also- thanks for the BBDuk suggestions, however it'd be great to do this with just the base terminal commands rather than installing a new bioinformatics package.

            Comment


            • #7
              No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).

              Comment


              • #8
                Originally posted by GenoMax View Post
                No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).
                Those are conflicting statements, you can't accomplish not installing anything by installing something . I think I know what you mean though, no external dependencies. We don't really need anything complicated for this task, perl or awk should do fine as mentioned above.

                Here is an example using Perl:

                Code:
                perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                You would repeat this for seqs.qual.
                Last edited by SES; 05-14-2015, 11:47 AM.

                Comment


                • #9
                  Originally posted by colinn View Post
                  So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning
                  Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.

                  Comment


                  • #10
                    Originally posted by dariober View Post
                    Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.
                    @dario: See this thread for background.

                    Comment


                    • #11
                      Originally posted by SES View Post

                      Here is an example using Perl:

                      Code:
                      perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                      You would repeat this for seqs.qual.
                      Are we golfing?

                      perl -i'.bak' -pe 's/(>.*?\n).{15}/$1/' seqs.fas

                      perl -i'.bak' -pe 's/(>.*?\n)(\s*\d\s+){15}/$1/' seqs.qual

                      Warning: I did not test these.

                      --
                      Phillip

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      57 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X