Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BOWTIE - match with empty chr and 0 position

    Hi all,

    I am still learning how BOWTIE does its jobs, and now I got something that I do not understand. Basically what I did was just BOWTIE with the default options (bowtie -p 8 -q --solexa1.3-quals --sam-nohead -S) with human genome reference. When checking the sam output, I saw something like below:
    Code:
    HWUSI-EAS751_0001:1:1:0:852#0/1	16	gi|224589800|ref|NC_000001.10|	155633307	255	35M	*	0	0	TGAGACCAGCCTGACCAACAAGGTGAAACCCCGTN	CCCCA;ACCCCCCCCCCCCCCCBCCBBBAAAABB#	XA:i:1	MD:Z:34C0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:823#0/1	16	gi|224589817|ref|NC_000005.9|	55233504	255	35M	*	0	0	TAATCTTATCAGCACAATATAATCTAACAATACCN	CCCCBCACCCCCCCCCCCCCCCC@CCCCCCCCBB#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:385#0/1	0	gi|224589811|ref|NC_000002.11|	230683139	255	35M	*	0	0	NCAGTAACTGACACATCTCAATAACTGCCTGAAGC	#CCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCAC	XA:i:1	MD:Z:0C34	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1865#0/1	16	gi|224589805|ref|NC_000014.8|	47407772	255	35M	*	0	0	ATCTGACCCCAATTAGAACAGCTATTATGAAAAAN	BAB?B;AAC@CACBCCCBCCCBCCCCCCCCCCCC#	XA:i:1	MD:Z:34G0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1878#0/1	16	gi|224589821|ref|NC_000009.11|	132898793	255	35M	*	0	0	GCAGGGGAACAGGTACCTCCGAGGGTGAGAGTCGN	@;@BBBBBBAABBBAAAAA?BBBBBBBBBBBB?B#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1348#0/1	0	gi|224589811|ref|NC_000002.11|	39229151	255	35M	*	0	0	NTCCTTTCACTTAAGAACATGTTATGGCCAGGCGC	#CCCCCCCCCCCCCCCCCCCCCCCCBABCCCCCBB	XA:i:1	MD:Z:0C34	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1507#0/1	16	gi|224589800|ref|NC_000001.10|	15747351	255	35M	*	0	0	CCCAAGCTGGTCTGAAACTCCTGGGCTCAAGTGAN	A=@BCCBCCCCCCCCCCCBAABCCCCCCCCCCCC#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:69#0/1	16	gi|224589818|ref|NC_000006.11|	74229138	255	35M	*	0	0	GGTCTCAAATTTCCACAAGGAGATATCAATGGTGN	CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC#	XA:i:1	MD:Z:34A0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:200#0/1	16	gi|224589809|ref|NC_000018.9|	55285040	255	35M	*	0	0	GGGAGGCTGAGGCAGAAGAATCTCTTGAATCCGGN	CCCCCCCCCCCCCCCCCCCB?B;@CCCCCB@ACC#	XA:i:1	MD:Z:34G0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:418#0/1	4	*	0	0	*	*	0	0	NATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATC	#BCCCCCBABCCACCCCAABAACCC@-@@9=8@>>	XM:i:0
    HWUSI-EAS751_0001:1:1:0:978#0/1	4	*	0	0	*	*	0	0	NCTCGCCGACGCCTCTCATCTCACACCTGTCCACG	###################################	XM:i:0
    My question concerns about those matches with * in their names and 0 in their positions in the reference. Can anybody explain to me what are those? Why these matches not in the reference and BOWTIE still reports them? How do I get the sam result without those matches?

    Thanks,

    D.

  • #2
    These are unmatched reads - i.e. those that did not align to a reference sequence.

    What are you planning to do with the reads that having these would be a problem?

    Comment


    • #3
      Originally posted by ffinkernagel View Post
      These are unmatched reads - i.e. those that did not align to a reference sequence.
      Yeah I guessed that but I could not find any option with BOWTIE to prevent showing those. Anyone has any experience?

      Originally posted by ffinkernagel View Post
      What are you planning to do with the reads that having these would be a problem?
      Those would surely be a problem. When I converted SAM to BED, the BED file then contains a lot of matches that are not in the reference and it showed error when I tried to upload to UCSC. I do want to remove those, but I do not what I should do. Do you have any suggestion ffinkernagel?

      Thanks,

      D.

      Comment


      • #4
        Well, serveral, depending on your ultimate goal.
        How do you convert to BED, and do you really want to display *reads* in the UCSC browser.
        Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
        and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.

        Comment


        • #5
          Originally posted by ffinkernagel View Post
          How do you convert to BED
          I use a python code from another user in the forum (http://seqanswers.com/forums/showpos...12&postcount=2) to convert to BED.
          Originally posted by ffinkernagel View Post
          and do you really want to display *reads* in the UCSC browser
          Why not?
          Originally posted by ffinkernagel View Post
          Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
          and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.
          Yeah, that is the problem. What I want to do is to eliminate them in SAM file so that I can use my result with other softwares as well, not just UCSC Browser. How could you split the result into chromosomes?

          Thanks,

          D.

          Comment


          • #6
            Well, to just filter out the not aligned positions,
            replace the line that reads
            chrom = samFields[2]
            with
            chrom = samFields[2]
            if chrom == '*': # or whatever is the offending chromosome 'name'
            continue # one tab or four spaces, depending on what's already in the file.

            Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

            Would be easier to just get the vancouver short read package and use their convertToBed utility.

            So long,
            Florian

            Comment


            • #7
              Originally posted by ffinkernagel View Post
              Well, to just filter out the not aligned positions,
              replace the line that reads
              chrom = samFields[2]
              with
              chrom = samFields[2]
              if chrom == '*': # or whatever is the offending chromosome 'name'
              continue # one tab or four spaces, depending on what's already in the file.

              Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

              Would be easier to just get the vancouver short read package and use their convertToBed utility.

              So long,
              Florian
              Thanks for your suggestion Florian. I thought that would be some option of BOWTIE that I was not aware of . Anyway, I still do not understand why BOWTIE reports unmatched reads? Is the software supposed to the matching job, ie to show up what is matched?

              I checked output with MAQ and I do not see those unmatched reads.

              Thanks,

              D.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X