Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BOWTIE - match with empty chr and 0 position

    Hi all,

    I am still learning how BOWTIE does its jobs, and now I got something that I do not understand. Basically what I did was just BOWTIE with the default options (bowtie -p 8 -q --solexa1.3-quals --sam-nohead -S) with human genome reference. When checking the sam output, I saw something like below:
    Code:
    HWUSI-EAS751_0001:1:1:0:852#0/1	16	gi|224589800|ref|NC_000001.10|	155633307	255	35M	*	0	0	TGAGACCAGCCTGACCAACAAGGTGAAACCCCGTN	CCCCA;ACCCCCCCCCCCCCCCBCCBBBAAAABB#	XA:i:1	MD:Z:34C0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:823#0/1	16	gi|224589817|ref|NC_000005.9|	55233504	255	35M	*	0	0	TAATCTTATCAGCACAATATAATCTAACAATACCN	CCCCBCACCCCCCCCCCCCCCCC@CCCCCCCCBB#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:385#0/1	0	gi|224589811|ref|NC_000002.11|	230683139	255	35M	*	0	0	NCAGTAACTGACACATCTCAATAACTGCCTGAAGC	#CCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCAC	XA:i:1	MD:Z:0C34	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1865#0/1	16	gi|224589805|ref|NC_000014.8|	47407772	255	35M	*	0	0	ATCTGACCCCAATTAGAACAGCTATTATGAAAAAN	BAB?B;AAC@CACBCCCBCCCBCCCCCCCCCCCC#	XA:i:1	MD:Z:34G0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1878#0/1	16	gi|224589821|ref|NC_000009.11|	132898793	255	35M	*	0	0	GCAGGGGAACAGGTACCTCCGAGGGTGAGAGTCGN	@;@BBBBBBAABBBAAAAA?BBBBBBBBBBBB?B#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1348#0/1	0	gi|224589811|ref|NC_000002.11|	39229151	255	35M	*	0	0	NTCCTTTCACTTAAGAACATGTTATGGCCAGGCGC	#CCCCCCCCCCCCCCCCCCCCCCCCBABCCCCCBB	XA:i:1	MD:Z:0C34	NM:i:1
    HWUSI-EAS751_0001:1:1:0:1507#0/1	16	gi|224589800|ref|NC_000001.10|	15747351	255	35M	*	0	0	CCCAAGCTGGTCTGAAACTCCTGGGCTCAAGTGAN	A=@BCCBCCCCCCCCCCCBAABCCCCCCCCCCCC#	XA:i:1	MD:Z:34T0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:69#0/1	16	gi|224589818|ref|NC_000006.11|	74229138	255	35M	*	0	0	GGTCTCAAATTTCCACAAGGAGATATCAATGGTGN	CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC#	XA:i:1	MD:Z:34A0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:200#0/1	16	gi|224589809|ref|NC_000018.9|	55285040	255	35M	*	0	0	GGGAGGCTGAGGCAGAAGAATCTCTTGAATCCGGN	CCCCCCCCCCCCCCCCCCCB?B;@CCCCCB@ACC#	XA:i:1	MD:Z:34G0	NM:i:1
    HWUSI-EAS751_0001:1:1:0:418#0/1	4	*	0	0	*	*	0	0	NATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATC	#BCCCCCBABCCACCCCAABAACCC@-@@9=8@>>	XM:i:0
    HWUSI-EAS751_0001:1:1:0:978#0/1	4	*	0	0	*	*	0	0	NCTCGCCGACGCCTCTCATCTCACACCTGTCCACG	###################################	XM:i:0
    My question concerns about those matches with * in their names and 0 in their positions in the reference. Can anybody explain to me what are those? Why these matches not in the reference and BOWTIE still reports them? How do I get the sam result without those matches?

    Thanks,

    D.

  • #2
    These are unmatched reads - i.e. those that did not align to a reference sequence.

    What are you planning to do with the reads that having these would be a problem?

    Comment


    • #3
      Originally posted by ffinkernagel View Post
      These are unmatched reads - i.e. those that did not align to a reference sequence.
      Yeah I guessed that but I could not find any option with BOWTIE to prevent showing those. Anyone has any experience?

      Originally posted by ffinkernagel View Post
      What are you planning to do with the reads that having these would be a problem?
      Those would surely be a problem. When I converted SAM to BED, the BED file then contains a lot of matches that are not in the reference and it showed error when I tried to upload to UCSC. I do want to remove those, but I do not what I should do. Do you have any suggestion ffinkernagel?

      Thanks,

      D.

      Comment


      • #4
        Well, serveral, depending on your ultimate goal.
        How do you convert to BED, and do you really want to display *reads* in the UCSC browser.
        Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
        and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.

        Comment


        • #5
          Originally posted by ffinkernagel View Post
          How do you convert to BED
          I use a python code from another user in the forum (http://seqanswers.com/forums/showpos...12&postcount=2) to convert to BED.
          Originally posted by ffinkernagel View Post
          and do you really want to display *reads* in the UCSC browser
          Why not?
          Originally posted by ffinkernagel View Post
          Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
          and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.
          Yeah, that is the problem. What I want to do is to eliminate them in SAM file so that I can use my result with other softwares as well, not just UCSC Browser. How could you split the result into chromosomes?

          Thanks,

          D.

          Comment


          • #6
            Well, to just filter out the not aligned positions,
            replace the line that reads
            chrom = samFields[2]
            with
            chrom = samFields[2]
            if chrom == '*': # or whatever is the offending chromosome 'name'
            continue # one tab or four spaces, depending on what's already in the file.

            Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

            Would be easier to just get the vancouver short read package and use their convertToBed utility.

            So long,
            Florian

            Comment


            • #7
              Originally posted by ffinkernagel View Post
              Well, to just filter out the not aligned positions,
              replace the line that reads
              chrom = samFields[2]
              with
              chrom = samFields[2]
              if chrom == '*': # or whatever is the offending chromosome 'name'
              continue # one tab or four spaces, depending on what's already in the file.

              Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

              Would be easier to just get the vancouver short read package and use their convertToBed utility.

              So long,
              Florian
              Thanks for your suggestion Florian. I thought that would be some option of BOWTIE that I was not aware of . Anyway, I still do not understand why BOWTIE reports unmatched reads? Is the software supposed to the matching job, ie to show up what is matched?

              I checked output with MAQ and I do not see those unmatched reads.

              Thanks,

              D.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM
              • seqadmin
                Multiomics Techniques Advancing Disease Research
                by seqadmin


                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                A major leap in the field has
                ...
                02-08-2024, 06:33 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 02-28-2024, 06:12 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-23-2024, 04:11 PM
              0 responses
              70 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2024, 08:52 AM
              0 responses
              78 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-20-2024, 08:57 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X