Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • trimming in tophat

    Hi all,

    I am trying to analyse my PE Illumina data using tophat.

    At first I run fastqc. Checking the raw data, I discovered at the beginnings (and presumably at the ends) of my reads I have some containments from the adapters of the sequencing.
    I run bowtie first on both the full length and trimmed sequences and got better results with the trimmed sequences.

    Do I need to trim the data before running tophat?

    Does someone know how to do it? do I need to convert my trimmed sam files (bowtie output) back into fastq files?

    Thanks for any help
    Assa

  • #2
    Hi, I found this useful page about this issue.



    HTH

    Dave

    Comment


    • #3
      Thanks for the tip.
      It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

      BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
      running the fastqc program I get a lot of duplicated reads (see attachment).

      As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

      Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

      Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

      Thanks

      Assa
      Attached Files

      Comment


      • #4
        did you get the answer ?
        would like to share it here
        thank you

        Originally posted by frymor View Post
        Thanks for the tip.
        It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

        BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
        running the fastqc program I get a lot of duplicated reads (see attachment).

        As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

        Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

        Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

        Thanks

        Assa

        Comment


        • #5
          No I didn't get any response for the questions I posted.

          I am not sure though how important is the duplication rate in this step. I'm using tophat2 with the option to exclude all duplicated reads, so I am not worried about the duplication in the original fastq file.

          I hope I am thinking in the right direction.

          Comment


          • #6
            Sangenix

            SangeniX: A comprehensive, automated, scalable and user friendly NGS data analysis suite

            Sangenix Has module for duplication removal.

            Give it a try : http://www.sangenix.com/

            Comment


            • #7
              let me know again, when it is a freeware

              Comment


              • #8
                Sangenix

                Beta Version is available. you can contact to us via contact page in http://www.sangenix.com/contactus.aspx

                Comment


                • #9
                  Removing the duplicates could be done with the samtools rmdup command (you could alternatively use markDuplicates from picard). This is generally not needed for RNAseq, since a certain amount of duplication would be both expected and desired for highly expressed genes (i.e., many/most of these probably aren't PCR duplicates).

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM
                  • seqadmin
                    Multiomics Techniques Advancing Disease Research
                    by seqadmin


                    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                    A major leap in the field has
                    ...
                    02-08-2024, 06:33 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 02-28-2024, 06:12 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-23-2024, 04:11 PM
                  0 responses
                  74 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-21-2024, 08:52 AM
                  0 responses
                  84 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-20-2024, 08:57 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X