Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • arrchi
    Member
    • Mar 2011
    • 46

    unique hits

    Hi there,

    I am looking for a way to find unique hits for my RNA-seq data. After searching online and in this community, I still can't find a good way to find unique hits from a sam file. Any input will be welcome.

    I used tophat to align the data to the HG19 and samtools -bq -1 to generated reliable hits.

    Here is part of the output:

    [Tophat_out]$ grep -w SRR087416.97659 accepted_hits_realiable.sam
    SRR087416.97659 0 chr1 11356 1 36M * 0 0 CAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGT BBBCBCB9CBBBC@:+>3A97AACABA9@CCCCB9# NM:i:0 NH:i:4 CC:Z:chr12 CP:i:94218
    SRR087416.97659 16 chr12 94218 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr15 CP:i:102519779
    SRR087416.97659 16 chr15 102519779 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr2 CP:i:114359624

    My questions are: Does this mean that the read SRR087416.97659 maps to chr1, chr12, and chr15?

    If I understand the concept "unique-hit" correctly, then this read can not be counted as an unique hit. Am I right?

    Thanks,

    -A
  • rnaeye
    Member
    • May 2011
    • 80

    #2
    Originally posted by arrchi View Post
    Hi there,

    I am looking for a way to find unique hits for my RNA-seq data. After searching online and in this community, I still can't find a good way to find unique hits from a sam file. Any input will be welcome.

    I used tophat to align the data to the HG19 and samtools -bq -1 to generated reliable hits.

    Here is part of the output:

    [Tophat_out]$ grep -w SRR087416.97659 accepted_hits_realiable.sam
    SRR087416.97659 0 chr1 11356 1 36M * 0 0 CAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGT BBBCBCB9CBBBC@:+>3A97AACABA9@CCCCB9# NM:i:0 NH:i:4 CC:Z:chr12 CP:i:94218
    SRR087416.97659 16 chr12 94218 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr15 CP:i:102519779
    SRR087416.97659 16 chr15 102519779 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr2 CP:i:114359624

    My questions are: Does this mean that the read SRR087416.97659 maps to chr1, chr12, and chr15?

    If I understand the concept "unique-hit" correctly, then this read can not be counted as an unique hit. Am I right?

    Thanks,

    -A
    You are correct: SRR087416.97659 sequence maps more than one location in the genome. This means that you have no way of knowing where that sequence is coming from. You need to sort your output by sequence ID, then find uniq IDs. Does this help? Unique read should hit the genome only once at one specific location.

    Comment

    • rnaeye
      Member
      • May 2011
      • 80

      #3
      btw, what instrumentation this output is coming from? Thanks.

      Comment

      • arrchi
        Member
        • Mar 2011
        • 46

        #4
        Thanks for your message.

        The result is generated by a Mac Pro with 8GB Memory and 1TB hard drive. Is that you want to know?

        Comment

        • rnaeye
          Member
          • May 2011
          • 80

          #5
          my questions was what DNA sequencing platform this output is from, such as Illumina, ABI SOLiD, etc. thanks.

          Comment

          • arrchi
            Member
            • Mar 2011
            • 46

            #6
            Oh. Sorry.

            This is Illumina RNA sequencing data.

            Comment

            • arrchi
              Member
              • Mar 2011
              • 46

              #7
              Does anybody knows that the value 0 in "SRR087416.97659 0" means? I checked samtools menu, it only says if 0x1 is unset, no assumptions can be made about .....

              Comment

              • arrchi
                Member
                • Mar 2011
                • 46

                #8
                I found an old post saying that
                Flag 0 means "the read is not paired and mapped, forward strand".
                Hope it is true.
                Last edited by arrchi; 06-01-2011, 07:55 AM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                41 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                51 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                38 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                193 views
                0 reactions
                Last Post seqadmin  
                Working...