Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some questions about running tophat & cufflinks



    After reading the manual, I still have some questions about running tophat & cufflinks, as following:

    Question 1:
    I am a bit confused with the options "anchor-length " and "segmet-length" when running tophat..

    In my opinion, the default setting "--segment-length" 25 means a segment read no less than 25, and "--min-anchor-length " 8 means a cut of 8, which is smaller than 25, so what is the exact length of the final read? Is it 25 or 8? What is the specificity of using "--min-anchor-length "?

    To address this question, I propose the following process to test a read length of 75, using the tophat mapped genome,
    First, tophat will map initial reads to genome, the initial reads length is 75
    Second, tophat will split all IUM reads into smaller segments to map again, now the length is at least 25
    Final, still unmapped reads will map junctions in the tophat database of possible splice junctions, a mapped read longer than the "anchor length" will confirm a junction, am I right?

    Question 2:
    If the annotation file is download from public database such as ensembel, does the annotation file make a difference in the output result when running tophat? In other words, is there a difference running tophat with or without the annotation file?

    Question 3:
    In file “tophat/logs/prep_reads” it reads “6975 out of 28036024 reads have been filtered out”. What is the reason to filter the reads? Is it because the read’s quality is too low or the read can’t mapped genome?

    Question 4:
    *.diff files were obtained when cuffdiff is finished, if I set the minimum number of FPKM values, like 0.1, can I also keep the number of false positives/negative low after in *.diff files?

    Thank you very much.
    Best regards,
    song

  • #2
    can anybody help..?

    T_T

    Comment


    • #3
      please ... ...

      Comment


      • #4
        Hi, I am also a beginer struggling with TOPHAT & Cufflinks and I tried to answer your questions but I am not quite sure...
        For question 1, I agree with your ideas on the segment length. But I run the segment length of 17 on my last mapping so I am not clear whether 25 is a minium. While the anchor lenght refers to the number of bp at the splice junctions. If the anchor length is 8, that means if the reads have 7 bp on one exon and the other 18 on the other exon, it might be discard...

        For question 2, I think it would be better to use other software to construct the genome-independent reconstruction based on your seq-result such as Velvet.

        Q3, have you run a fastqc or something like that to quickly check your reads quality? Usually it will introduce some mismatch or low quality data from the sequencing and if you set the tolerance low it will discard the mismatch reads.

        I am not clear about Q4...

        Comment


        • #5
          [QUOTE=stanwish;54637]
          For question 2, I think it would be better to use other software to construct the genome-independent reconstruction based on your seq-result such as Velvet.

          Q3, have you run a fastqc or something like that to quickly check your reads quality? Usually it will introduce some mismatch or low quality data from the sequencing and if you set the tolerance low it will discard the mismatch reads.
          QUOTE]

          Q2: No matter how many times,the result from using Tophat with annotation will more when other parameters same.

          Q3: the number of reads have been filtered out seems has nothing to do with
          parameters choose,so I think it because the read’s quality is too low

          Comment


          • #6
            Question 3:
            In file “tophat/logs/prep_reads” it reads “6975 out of 28036024 reads have been filtered out”. What is the reason to filter the reads? Is it because the read’s quality is too low or the read can’t mapped genome?
            Also have this issue - why is tophat filtering reads? Based on what criteria?
            And I'm assuming this is happening before the mapping?

            What I've done:
            I've used SolexaQA to trim my reads to be of decent quality (albeit some are shorter than others, with median read lengths of 75-100 bp in all of my datasets, with the first mate being median 98-100, and the pair - 75-85. Mean read lengths are 83-84 and 68-71). However, when I look at the log file I find that 1-2% of my reads end up being discarded by tophat...

            While it could be that some of my reads are just very short and hence discarded by tophat, I'd like to understand a bit better what exactly is going on here...

            Thanks in advance!

            Comment


            • #7
              Tophat kept versus discarded reads

              I am also curious about how Tophat decides (during the run) to keep or discard reads. For example, I am using Tophat to analyze ~80 million reads, and during the job, I see that Tophat has kept 80 million reads, while discarding about 10-500K reads. Why is this?

              Comment


              • #8
                Could it be your multi-mapping parameter?

                Comment


                • #9
                  I have started a new thread that recapitulates this kept/discarded question.

                  About the multi-mapping parameter, I don't think that may be it: The kept/discarded reads, however, are calculated very early in the run and happens much faster (few minutes) than the time it takes to map all the reads. I am thinking it is more likely a fast QC-related filter?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Addressing Off-Target Effects in CRISPR Technologies
                    by seqadmin






                    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                    08-27-2024, 04:44 AM
                  • seqadmin
                    Selecting and Optimizing mRNA Library Preparations
                    by seqadmin



                    Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                    08-07-2024, 12:11 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 08-27-2024, 04:40 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-22-2024, 05:00 AM
                  0 responses
                  293 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-21-2024, 10:49 AM
                  0 responses
                  135 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-19-2024, 05:12 AM
                  0 responses
                  124 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X