Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    htseq-count counts each read pair for the gene it maps to as long as it maps unambiguously to one gene. If one of the two mates could not be aligned, htseq-count uses the information from the aligned mate only.

    What to you mean by "including single reads"? Do you have a sample with a mixture of paired-end and single-end reads?

    Comment


    • #17
      Hi Simon. thanks for the reply.

      what we generate is the paired end read. then we run the Q20 dynamic trim of SolexaQA package to filtered out low quality reads. Dynamic trim give two output-one single reads and other paired reads. we used both single read and paired read.
      so what is your suggestion. should we removed the single reads for mapping.

      Comment


      • #18
        Why should you? I'm still unsure whether I understand where you problem is.

        You will need to align the paired-end and the single-end FASTQ files in separate runs of the aligner, anyway, because the aligner won't like it if you pass a mixture of both. And then you get two SAM files and count htseq-count for each file. And if you like, you can add up the counts from the two files.

        And as an aside: All modern alignment tool take the base-call qualities into account and will leave a bad-quality read unaligned. I believe that there is not need to filter out low-quality reads beforehands. Better leave this to your aligner, it know better when a read is too bad to be mappable.

        Comment


        • #19
          thanks. Simon
          i go through the top hat manual again and it says the same as you suggest.
          well it seems to the right to align with fastq reads.
          one more problem
          we got consistent warnings says 'not mate found. is SAM properly sorted?' we check the sorted SAM and it and seems fine. After ignoring such warnings we got "0" counts for many GTF transcripts (Cufflink) with HTseq. we are not sure whether it is because of the not finding mate.
          please suggest

          Comment


          • #20
            Is you file sorted by read name (not: by position)?

            Comment


            • #21
              If you're trying to count by transcript rather than gene and there are multiple overlapping transcripts (quite common), then it'd be unsurprising to have 0 counts for most things.

              BTW, the warning message is probably due to the odd tophat behaviour of leaving out unmapped mates if it can't map both as a pair and only one maps as a singleton.

              Comment


              • #22
                Spotted with keen eye by Devon. Counting reads per transcript is rarely useful. (See my several earlier posts on this.) What did you intend to do with these counts?

                Comment


                • #23
                  we sort the SAM with read name with the command 'sort -s -k 1,1 A.SAM > Sorted.SAM

                  Comment


                  • #24
                    we have RNA Seq data for six different conditions. we mapped filtered reads of these conditions to reference genome independently and get the SAM and with the maping information we run Cufflink. After cufflink, the redundancy was removed withh cuffmerge.
                    We are counting the GFF transcripts from the cuffmerge. i am not sure whether we should call it gene or transcripts. we intend to take these counts as input for DESeq.
                    however we are stuck in this counting step.

                    Comment


                    • #25
                      If a gene has multiple transcripts then that won't work. If you ended up squishing all of the transcripts of a gene together into a single "union gene model" then that likely will work. When in doubt, (1) look at the data visually with IGV/a text editor and (2) run htseq-count with the -o option so you can see what happens to some reads that you think (from looking at things in IGV or a similar tool) should increase the count of a gene/transcript.

                      Comment


                      • #26
                        Hi Ryan,
                        when we check the sorted SAM with "-s -c 1,1 sorted A.sam" we get error.
                        So this suggest the sorting is the problem.
                        And sorting the sam file takes long time. is there any alternative to sort the sam. let me remind you that the sam file size is ~120GB.

                        Comment


                        • #27
                          Yeah, if you get an error then likely something went wrong. Just stick to using BAM files (they're smaller anyway) and then just "samtools sort -n A.bam A.namesorted".

                          Comment


                          • #28
                            we are trying samtools and will let you know.
                            thanks

                            Comment


                            • #29
                              Hi Ryan
                              we tried to sort the BAM file with the samtools with the command 'samtools sort -n -m maxMem A.bam -0 sortA.bam. however sorting is taking longer time more than 24 hrs and it not complete yet. And a number of bam files are also generated in the file.

                              Comment


                              • #30
                                Yeah, with 40 GB files that'll still take a while. There are some implementations of samtools that can do multithreaded sorting. Have a look through this thread for those. There's also biobambam (there's also a publication that I can't seem to find at the moment), which is supposed to be generally faster but I've never actually used.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  05-06-2024, 07:48 AM
                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 05-14-2024, 07:03 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-10-2024, 06:35 AM
                                0 responses
                                37 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-09-2024, 02:46 PM
                                0 responses
                                46 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-07-2024, 06:57 AM
                                0 responses
                                39 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X