Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina 1.9 read lengths and trimming

    Hi all.

    I have had a genome of a bacteria I am working with sequenced by my universities sequencing facility. It has been sequenced on a Miseq and I have paired-end reads. I have received from them the raw fastq files and files that they have trimmed using sickle and scythe.

    I have run all the files through fastqc and have this has told me that the read lengths are as follows.

    Untrimmed_1 = 236 - 251
    Untrimmed_2 = 35 - 251
    Trimmed_1 = 20 - 251
    Trimmed_2 = 20 - 251

    It is my understanding that, at least the untrimmed, reads should be the same length?
    I also received a few warnings on all files for Kmer content, but I think this might be due to my organism having a low GC content (~25%).

    I would like to know if these read lengths are acceptable? Should I look at trimming them to the same length? Is there any perticually good software for this?

    To be honest, I have very little idea what I need to do. If anyone has any good links to papers or other information about triming files etc. I would really appreciate that.

    Thanks.

  • #2
    Do you know if the data was processed (before the trimming) on MiSeq itself/BaseSpace or after the run using bcl2fastq/CASAVA?

    Comment


    • #3
      Unfortunately not. All I received was an email with the files that basically said thanks for your custom.

      Comment


      • #4
        I have a feeling that some trimming occurred during the pre-processing of the data (either on MiSeq/BaseSpace) and what you received was not original full length reads. It probably does not really matter since you would have removed those bases yourself during post-processing. One issue that can result in shorter reads is that you had inserts that were shorter than what you thought they were.

        If you are looking to assemble the data then SPAdes is a good option.
        Last edited by GenoMax; 09-29-2014, 07:44 AM.

        Comment


        • #5
          There are plenty of threads for various trimming programs. BBDuk is the simplest option.

          Comment


          • #6
            Thanks for your help.

            Comment


            • #7
              Originally posted by jellybaby83 View Post
              Hi all.

              I have had a genome of a bacteria I am working with sequenced by my universities sequencing facility. It has been sequenced on a Miseq and I have paired-end reads. I have received from them the raw fastq files and files that they have trimmed using sickle and scythe.

              I have run all the files through fastqc and have this has told me that the read lengths are as follows.

              Untrimmed_1 = 236 - 251
              Untrimmed_2 = 35 - 251
              Trimmed_1 = 20 - 251
              Trimmed_2 = 20 - 251

              It is my understanding that, at least the untrimmed, reads should be the same length?
              I also received a few warnings on all files for Kmer content, but I think this might be due to my organism having a low GC content (~25%).

              I would like to know if these read lengths are acceptable? Should I look at trimming them to the same length? Is there any perticually good software for this?

              To be honest, I have very little idea what I need to do. If anyone has any good links to papers or other information about triming files etc. I would really appreciate that.

              Thanks.
              If you have raw fastq files where all the reads are of the same length, you may use skewer for adapter trimming.

              See http://www.biomedcentral.com/1471-2105/15/182/ for your reference.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                Yesterday, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:58 AM
              0 responses
              5 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:18 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 08:04 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-03-2024, 06:55 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Working...
              X