Header Leaderboard Ad

Collapse

trimming in tophat

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • trimming in tophat

    Hi all,

    I am trying to analyse my PE Illumina data using tophat.

    At first I run fastqc. Checking the raw data, I discovered at the beginnings (and presumably at the ends) of my reads I have some containments from the adapters of the sequencing.
    I run bowtie first on both the full length and trimmed sequences and got better results with the trimmed sequences.

    Do I need to trim the data before running tophat?

    Does someone know how to do it? do I need to convert my trimmed sam files (bowtie output) back into fastq files?

    Thanks for any help
    Assa

  • #2
    Hi, I found this useful page about this issue.

    http://bioinfo-core.org/index.php/9t...8_October_2010

    HTH

    Dave

    Comment


    • #3
      Thanks for the tip.
      It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

      BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
      running the fastqc program I get a lot of duplicated reads (see attachment).

      As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

      Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

      Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

      Thanks

      Assa
      Attached Files

      Comment


      • #4
        did you get the answer ?
        would like to share it here
        thank you

        Originally posted by frymor View Post
        Thanks for the tip.
        It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

        BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
        running the fastqc program I get a lot of duplicated reads (see attachment).

        As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

        Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

        Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

        Thanks

        Assa

        Comment


        • #5
          No I didn't get any response for the questions I posted.

          I am not sure though how important is the duplication rate in this step. I'm using tophat2 with the option to exclude all duplicated reads, so I am not worried about the duplication in the original fastq file.

          I hope I am thinking in the right direction.

          Comment


          • #6
            Sangenix

            SangeniX: A comprehensive, automated, scalable and user friendly NGS data analysis suite

            Sangenix Has module for duplication removal.

            Give it a try : http://www.sangenix.com/

            Comment


            • #7
              let me know again, when it is a freeware

              Comment


              • #8
                Sangenix

                Beta Version is available. you can contact to us via contact page in http://www.sangenix.com/contactus.aspx

                Comment


                • #9
                  Removing the duplicates could be done with the samtools rmdup command (you could alternatively use markDuplicates from picard). This is generally not needed for RNAseq, since a certain amount of duplication would be both expected and desired for highly expressed genes (i.e., many/most of these probably aren't PCR duplicates).

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                    by seqadmin


                    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                    01-24-2023, 01:19 PM
                  • seqadmin
                    Introduction to Single-Cell Sequencing
                    by seqadmin
                    Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                    The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                    ...
                    01-09-2023, 03:10 PM

                  ad_right_rmr

                  Collapse
                  Working...
                  X