Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Quoting from the first lines that appear when you run "cutadapt --help":
    Usage: cutadapt [options] <FASTA/FASTQ FILE> [<QUALITY FILE>]

    Reads a FASTA or FASTQ file, finds and removes adapters,
    and writes the changed sequence to standard output.
    When finished, statistics are printed to standard error.

    Use a dash "-" as file name to read from standard input
    (FASTA/FASTQ is autodetected).

    Comment


    • #62
      cutadapt for solid reads

      Hi,

      I'm using cutadapt tool v1.1 to trim adapters from my SOLiD colorspace reads. The tool does trim the adapters out, however, I haven't been able to get my reads back in colorspace. cutadapt has been converting them to basespace by default. Wonder if I'm missing something? Have you seen this before?

      My command line is as follows:
      cutadapt-1.1/bin/cutadapt --colorspace -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -x out_ --bwa -o out.fastq --untrimmed-output=out.fastq.untrimmed --double-encode filename.csfasta filename.qual
      I have tried changing output to --maq but still no effect.

      I was also wondering if the --double-encode option is required. I'm an absolute beginner to SOLiD reads (mostly I do Illumina), so I ask - aren't all recent SOLiD reads double-encoded? I may be wrong about this, though.

      Other than this, I have found this tool perfect for my purposes!
      Thanks!

      Comment


      • #63
        Hello,

        I’ll answer your questions below, but I have also updated the cutadapt README section on colorspace reads. Perhaps that also helps a bit.

        Originally posted by flobpf View Post
        Hi,
        I'm using cutadapt tool v1.1 to trim adapters from my SOLiD colorspace reads. The tool does trim the adapters out, however, I haven't been able to get my reads back in colorspace.
        I'm not sure what kind of output you would like to have. If you want a pair of csfasta/qual files, then this has been answered a few messages ago (in short: it’s currently not supported). If you want FASTQ files that contain colorspace reads in which the colors are encoded as numbers 0, 1, 2, 3, then this is possible, simply don’t use --maq, --bwa or --double-encode (see also the README file I linked to above).

        cutadapt has been converting them to basespace by default. Wonder if I'm missing something? Have you seen this before?
        Cutadapt never converts reads to basespace since that should be done after read mapping.

        My command line is as follows:
        cutadapt-1.1/bin/cutadapt --colorspace -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -x out_ --bwa -o out.fastq --untrimmed-output=out.fastq.untrimmed --double-encode filename.csfasta filename.qual
        I have tried changing output to --maq but still no effect.
        Please read through the descriptions of the --maq, --bwa and --double-encode options that are shown when running cutadapt -h. In summary: --maq is the same as --bwa. Both options imply --colorspace and --double-encode - you can simply leave them out.

        I was also wondering if the --double-encode option is required. I'm an absolute beginner to SOLiD reads (mostly I do Illumina), so I ask - aren't all recent SOLiD reads double-encoded? I may be wrong about this, though.
        Hm, I haven’t seen recent SOLiD files for a few months, but I guess they are not double-encoded. I guess the term is confusing. I’ll just copy the text from the README section I have just written. I hope that helps.

        Double-encoding, BWA and MAQ

        The read mappers MAQ and BWA (and possibly others) need their colorspace input reads to be in a so-called "double encoding". This simply means that they cannot deal with the characters 0, 1, 2, 3 in the reads, but require that the letters A, C, G, T be used for colors. For example, the colorspace sequence 0011321 would be AACCTGC in double-encoded form. This is not the same as conversion to basespace! The read is still in colorspace, only letters are used instead of digits. If that sounds confusing, that is because it is.

        Note that MAQ is unmaintained and should not be used in new projects.

        BWA’s colorspace support was dropped in versions more recent than 0.5.9, but that version works well.

        When you want to trim reads that will be mapped with BWA or MAQ, you can use the --bwa option, which enables colorspace mode (-c), double-encoding (-d) and primer trimming (-t), all of which are required for BWA, in addition to some other useful options.

        There is also the --maq option, which is simply another name for the --bwa option.

        Comment


        • #64
          If you want FASTQ files that contain colorspace reads in which the colors are encoded as numbers 0, 1, 2, 3, then this is possible, simply don’t use --maq, --bwa or --double-encode
          That helps. Thanks

          For example, the colorspace sequence 0011321 would be AACCTGC in double-encoded form. This is not the same as conversion to basespace! The read is still in colorspace, only letters are used instead of digits.
          OK. Thats better. I had misunderstood what double-encoding is.

          Thanks for your help!

          Comment


          • #65
            Hi Marcel,

            As I said above, cutadapt worked well to trim the adapters. I used the following command line
            Code:
            cutadapt-1.1/bin/cutadapt --colorspace --trim-primer -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -g 2130003001020221302222 -g 201122001 -g 2132113130301020331 -m 20 -q 20 -O 5 -x proc_ -o INPFILE.trim.fastq --untrimmed-output=INPFILE.untrim.fastq INPFILE.csfasta INPFILE.qual > INPFILE.stats
            Cutadapt produced the two expected output files. However, when I run BFAST using these files, it gives me the following errors:
            With trimmed file
            *** glibc detected *** bfast: malloc(): memory corruption: 0x00000000023f7e80 **
            With untrimmed file
            bfast: ../bfast/RGMatch.c:154: RGMatchPrint: Assertion `m->qualLength > 0' failed.
            BFAST ran fine with the original, "un-cutadapted" FASTQ file created by merging CSFASTA+QUAL using solid2fastq.pl in BFAST. No errors there.

            My BFAST command is
            Code:
            bfast easyalign -f ../_MOUSEGENOME/Mus_musculus.GRCm38.68.dna_rm.toplevel.fa -r INPFILE.trim.fastq -A 1 -n 4 > INPFILE.trim.fastq.easyalign
            Have you seen this before? Anything wrong with my cutadapt command line? I'd appreciate any suggestions on how to fix this problem.

            Thanks

            Comment


            • #66
              I haven't used BFAST in a while, but I think it requires that the primer base is still in the read. Could you try leaving out the --trim-primer option?

              Comment


              • #67
                Originally posted by flobpf View Post
                Hi Marcel,

                As I said above, cutadapt worked well to trim the adapters. I used the following command line
                Code:
                cutadapt-1.1/bin/cutadapt --colorspace --trim-primer -e 0.12 -a 2130003001020221302222 -a 201122001 -a 2132113130301020331 -g 2130003001020221302222 -g 201122001 -g 2132113130301020331 -m 20 -q 20 -O 5 -x proc_ -o INPFILE.trim.fastq --untrimmed-output=INPFILE.untrim.fastq INPFILE.csfasta INPFILE.qual > INPFILE.stats
                Cutadapt produced the two expected output files. However, when I run BFAST using these files, it gives me the following errors:
                With trimmed file


                With untrimmed file


                BFAST ran fine with the original, "un-cutadapted" FASTQ file created by merging CSFASTA+QUAL using solid2fastq.pl in BFAST. No errors there.

                My BFAST command is
                Code:
                bfast easyalign -f ../_MOUSEGENOME/Mus_musculus.GRCm38.68.dna_rm.toplevel.fa -r INPFILE.trim.fastq -A 1 -n 4 > INPFILE.trim.fastq.easyalign
                Have you seen this before? Anything wrong with my cutadapt command line? I'd appreciate any suggestions on how to fix this problem.

                Thanks
                Hi Flobpf,

                I'm having trouble determining which adapter sequences to use, as the SOLiD preparation guide is not clear on this. Furthermore, those oligos that appear are in basespace.

                Can you please inform me how you obtained the oligo sequences, and how you managed to get them in colourspace?

                I have SOLiD 4 data, but I was not involved in preparing or sequencing the data.

                Regards,
                Craig

                Comment


                • #68
                  Originally posted by Braganca View Post
                  Can you please inform me how you obtained the oligo sequences, and how you managed to get them in colourspace?
                  I don't know the oligo sequences, but if you want to use them with cutadapt, then there is no need to convert them to colorspace: Since version 1.1, you can give the adapter in basespace and cutadapt converts it for you.

                  Comment


                  • #69
                    Originally posted by mmartin View Post
                    I don't know the oligo sequences, but if you want to use them with cutadapt, then there is no need to convert them to colorspace: Since version 1.1, you can give the adapter in basespace and cutadapt converts it for you.

                    Thanks Martin,

                    That helped, I managed to locate the oligo sequences

                    Regards,
                    Craig

                    Comment


                    • #70
                      Filtering out reads lacking adapters

                      I'm trying to use cutadapt to remove an adapter sequence from my reads but I'd also like to discard any sequences that do not have an adapter (the adapter was added in an enrichment step prior to library construction, so reads lacking the adapter could be artifacts or contaminants). The --discard option seems to do the opposite of that. Would that be easy to change in cutadapt, is there a different tool that does something like this, or should I use a more roundabout option (based on identifying the reads not discarded by not having an adapter)?

                      Thanks,
                      Erick

                      Comment


                      • #71
                        There was a patch by James Casbon, which implements such an option for cutadapt. I have now integrated his work into cutadapt. That is, the most recent version of cutadapt, which you can get from https://github.com/marcelm/cutadapt , has a "--discard-untrimmed" option.

                        Comment


                        • #72
                          Thanks! I downloaded and installed v1.2 and get an error when I try --discard-untrimmed. It doesn't seem to recognize that option, but I do see "--untrimmed-output=FILE" which accomplishes the same goal (and is actually even better).

                          Comment


                          • #73
                            Great that worked for you! For those who really want the discard-untrimmed option, you would need to get the the version from GitHub (directly from version control). cutadapt 1.2rc2, available on Google code, does not have the option. I'll make a release soon to remedy this.

                            Comment


                            • #74
                              Hello, I've just released cutadapt 1.2. As always, get it from http://code.google.com/p/cutadapt/ or simply via "easy_install cutadapt". This is a copy of the list of changes:

                              • At least 25% faster processing of .csfasta/.qual files due to faster parser.
                              • Between 10% and 30% faster writing of gzip-compressed output files.
                              • Support 5' adapters in color space, even when no primer trimming is requested.
                              • The "--info-file" option has been added. Use this to write further information about the found adapters in each read to a separate file.
                              • Named adapters are now possible. Use "-a My_Adapter=ACCGTA" to assign the name "My_adapter" to an adapter.
                              • Improved the alignment algorithm for better poly-A trimming when there are sequencing errors. Previously, not the longest possible poly-A tail would be trimmed.
                              • James Casbon contributed the --discard-untrimmed option.

                              Comment


                              • #75
                                Hello,

                                Does cutadapt have the option to simply trim an n number of bases from the 5' or 3' end, as specified by the user?

                                I do not wish to remove adaptors, but to remove bases from the reads due to quality concerns. Is there any tool, if cutadapt is not suitable, that will do it for both .csfasta and .qual files?

                                Thanks a lot,

                                carmen

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X