Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to remove first 15 bases of a sequence using unix or qiime

    So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning, and I'd like to remove it, so then my downstream analysis will play nicely within QIIME.

    I've tried changing my mapping file by adding 15bp to each barcode of (NNNNNNNNNNNNNNN), but QIIME says thats not a cool thing to do.

    I'm interested in other ways to do this, and again for the .qual file.

  • #2
    Hi,

    I am not entirely sure what you want to achieve. But is sounds like BBduk or BB reformat could easily do it (http://seqanswers.com/forums/showthread.php?t=58221 ) as could do some simple scripts.

    Comment


    • #3
      Can you post an example 2-3 sequences from the fasta and qual files? BBduk from BBMap can manage the fasta file trim (forcetrimleft=15) for sure but I am not certain about the qual file. You can try it yourself.

      We can always do an awk/perl solution if this does not work.
      Last edited by GenoMax; 05-13-2015, 02:37 PM. Reason: Changed parameter based on Brian's explanation

      Comment


      • #4
        You can do that with BBDuk:

        bbduk.sh in=reads.fq out=trimmed.fq ftl=15

        If you have fasta + qual files:

        bbduk.sh in=reads.fa qfin=reads.qual out=trimmed.fa qfout=trimmed.qual ftl=15

        If not all of the reads contain that, and you only want to trim the reads that have a specific sequence, then use the flags "ktrim=l literal=X k=15" instead, where X is the sequence you want to trim. If you have lots of sequences, you can separate them with commas.

        Edit - double-ninja'd!

        @GenoMax. Sorry the numbering is a little confusing for force-trim; these are the definitions:
        Code:
        	/** Trim left bases of the read to this position (exclusive, 0-based) */
        	private final int forceTrimLeft;
        	/** Trim right bases of the read after this position (exclusive, 0-based) */
        	private final int forceTrimRight;
        	/** Trim this many rightmost bases of the read */
        	private final int forceTrimRight2;
        So to make a 100bp read 85 bp long by trimming the right end, you would say "ftr=84" or "ftr2=15"; to do so by trimming the left side, it would be "ftl=15".
        Last edited by Brian Bushnell; 05-13-2015, 02:34 PM.

        Comment


        • #5
          Brian: Shouldn't that be ftl=14 (0-based correct)?

          Comment


          • #6
            Hi Geno-

            Here are some sample lines of the .fna file, its pretty standard.

            >SRR1502445.1
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
            >SRR1502445.2
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.3
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.4
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN

            Also- thanks for the BBDuk suggestions, however it'd be great to do this with just the base terminal commands rather than installing a new bioinformatics package.

            Comment


            • #7
              No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).

              Comment


              • #8
                Originally posted by GenoMax View Post
                No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).
                Those are conflicting statements, you can't accomplish not installing anything by installing something . I think I know what you mean though, no external dependencies. We don't really need anything complicated for this task, perl or awk should do fine as mentioned above.

                Here is an example using Perl:

                Code:
                perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                You would repeat this for seqs.qual.
                Last edited by SES; 05-14-2015, 11:47 AM.

                Comment


                • #9
                  Originally posted by colinn View Post
                  So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning
                  Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.

                  Comment


                  • #10
                    Originally posted by dariober View Post
                    Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.
                    @dario: See this thread for background.

                    Comment


                    • #11
                      Originally posted by SES View Post

                      Here is an example using Perl:

                      Code:
                      perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                      You would repeat this for seqs.qual.
                      Are we golfing?

                      perl -i'.bak' -pe 's/(>.*?\n).{15}/$1/' seqs.fas

                      perl -i'.bak' -pe 's/(>.*?\n)(\s*\d\s+){15}/$1/' seqs.qual

                      Warning: I did not test these.

                      --
                      Phillip

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Non-Coding RNA Research and Technologies
                        by seqadmin




                        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                        Nobel Prize for MicroRNA Discovery
                        This week,...
                        10-07-2024, 08:07 AM
                      • seqadmin
                        Recent Developments in Metagenomics
                        by seqadmin





                        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                        09-23-2024, 06:35 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 10-11-2024, 06:55 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-02-2024, 04:51 AM
                      0 responses
                      110 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-01-2024, 07:10 AM
                      0 responses
                      114 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-30-2024, 08:33 AM
                      1 response
                      120 views
                      0 likes
                      Last Post EmiTom
                      by EmiTom
                       
                      Working...
                      X