Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Layla
    Member
    • Sep 2008
    • 58

    maq match command

    Dear all,

    I have a text file generated from the maq match command aligning paired end short reads to a reference genome. Any ideas how to filter out poor quality reads from this file, i.e. those reads that have been mapped to more than 1 location in the genome?

    How are people generally dealing with multiple hits from single read in the genome?

    Thanks for any help

    L
  • jnfass
    Member
    • Aug 2008
    • 88

    #2
    It all depends!

    Which text file are you referring to? Pileup? Dumped hits? If you specify how exactly you generated it, that will help others help you ...

    Also, for the map and downstream steps (consensus, SNP calling), maq puts a read that maps equally well (and satisfies the specified cutoffs) to multiple locations in the reference, in one of those locations ... randomly. This may or may not suit your needs, but as far as I know, there's no way to change it. You should be able to determine which reads map multiple times (and thus exclude them in a second round of mapping) by parsing the dumped hits file .. specified in the '-H' option to the map/match command ...

    Hope that helps
    ~Joe

    Comment

    • Layla
      Member
      • Sep 2008
      • 58

      #3
      The file I was referring to was generated using the maq match command. I have re-run the maq match command using the -H and -u options. Do you think the multiple matches should be removed and the maq match command run again or would it be sufficient to remove the multiple matches and move on to do a pileup.

      At this stage I shall focus on correctly paired reads (flagged 18), remove multiple hits (flag of 0) and also low mapping quality scores (<30).

      Any other suggestions or comments how people would go about cleaning their chip-seq data?

      Cheers

      L

      Comment

      • jnfass
        Member
        • Aug 2008
        • 88

        #4
        Sounds like you're talking about the mapview output ... generated from the "mapview" command, using the binary map file generated by the map/match command. I'm not as familiar with that file - for instance, I didn't know that there was a flag for multiply mapped reads in mapview's output - but it sounds like you've got a good strategy for parsing that file and filtering your pairs.

        Comment

        • Layla
          Member
          • Sep 2008
          • 58

          #5
          Thanx Joe..

          How did you convert the file created using the -H option in ./maq match command. The -H option was to generate the multiple hits and created a binary file. The ./maq mapview conversion does not work as it does for out.map. Is there a way to convert this binary file to text?

          Cheers
          L

          Comment

          • Owen
            Junior Member
            • Nov 2008
            • 1

            #6
            Layla, the file created with the -H option is not actually a binary file but a gzipped text file with information about the multiply mapped reads. I had the same confusion and finally figured this out!

            Comment

            • bioinfosm
              Senior Member
              • Jan 2008
              • 483

              #7
              I figured the same fact as Owen explains, its gzipped!
              For multiply mapped reads from mapview result, the reads with 0 quality are mapped to multiple locations, using -q 1 should do the trick in excluding multiply-mapped reads

              Thoughts?
              --
              bioinfosm

              Comment

              • Layla
                Member
                • Sep 2008
                • 58

                #8
                I think q -1 sounds like a valid option. Or you can simple grep for reads with the 0 flag and remove them before down-stream processing.

                Whilst on this note, anyone used SISSR instead of Maq? And if so, any thoughts on what to do with the data after SISSR gives 80,000 potential binding sites with p values < 0.001, high tag counts and fold changes?

                The data never simplifies!!!!

                L

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  Yesterday, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, Yesterday, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...