Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • koustavpal
    Member
    • Aug 2012
    • 14

    problem running multiz with MAF files

    i downloaded some pairwise alignment files from UCSC in the axtnet format and converted these to the MAF format the general header looks like this

    a score=21045.000000
    s chr10 1454 357 + 94855758 aataaaaattattggtccccattcctagtgattccaa
    s chr14 106421274 333 - 108792865 aataaaaattctttggccccattcttagtgagtcc

    I had run multiz using these files and then while running phastcons i found out that organisms had not been specified therefore i converted these headers to the appropriate format

    a score=21045.000000
    s organism1.chr10 1454 357 + 94855758 aataaaaattattggtccccattcctag
    s organism2.chr14 106421274 333 - 108792865 aataaaaattctttggccccattctta

    and when i run these files i get an error line 11 of organism1.organism2.maf : inconsistent row size

    and this problem is common in all files where i have made this change
    it would be really helpful if someone can point out the problem here.

    the multiz command i use is

    multiz chr1.organism1.organism2.maf chr1.organism1.organism3.maf chr1.unused > chr1.organism1.organism2.organism3.maf


    and the command i used to change them was:

    awk '/a score/{print;getline;gsub(/chr/,"organism1.chr",$0);print;getline;gsub(/chr/,"organism2.chr",$0);print} /#/{print;}' chr1.organism1.organism2.maf > chr1.organism1.organism2.maf2
  • koustavpal
    Member
    • Aug 2012
    • 14

    #2
    bump

    not to spam or anything but i really need help on this. so bump!

    Comment

    • milo0615
      Member
      • Dec 2012
      • 39

      #3
      Hi koustavpal,

      Were you able to figure it out? What was causing the problem? what Phastcons command did you use? I am in the same situation. Please let me know.


      thank you,

      -Milo

      Originally posted by koustavpal View Post
      i downloaded some pairwise alignment files from UCSC in the axtnet format and converted these to the MAF format the general header looks like this

      a score=21045.000000
      s chr10 1454 357 + 94855758 aataaaaattattggtccccattcctagtgattccaa
      s chr14 106421274 333 - 108792865 aataaaaattctttggccccattcttagtgagtcc

      I had run multiz using these files and then while running phastcons i found out that organisms had not been specified therefore i converted these headers to the appropriate format

      a score=21045.000000
      s organism1.chr10 1454 357 + 94855758 aataaaaattattggtccccattcctag
      s organism2.chr14 106421274 333 - 108792865 aataaaaattctttggccccattctta

      and when i run these files i get an error line 11 of organism1.organism2.maf : inconsistent row size

      and this problem is common in all files where i have made this change
      it would be really helpful if someone can point out the problem here.

      the multiz command i use is

      multiz chr1.organism1.organism2.maf chr1.organism1.organism3.maf chr1.unused > chr1.organism1.organism2.organism3.maf


      and the command i used to change them was:

      awk '/a score/{print;getline;gsub(/chr/,"organism1.chr",$0);print;getline;gsub(/chr/,"organism2.chr",$0);print} /#/{print;}' chr1.organism1.organism2.maf > chr1.organism1.organism2.maf2

      Comment

      • koustavpal
        Member
        • Aug 2012
        • 14

        #4
        Hi milo,

        I figured out the problem and got the entire pipeline working a long time ago, so it's a bit hard for me to remember how i did it. I'll try to help out as much as I can. So basically the first and only document I extensively referred to solve it was the UCSC 28way vertebrate alignment track documentation here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099589/

        Maybe you can start there if you haven't already.

        On a side note, this particular pipeline is a long and tedious one, so unless you are trying to do something which hasn't already been done I would strongly recommend against going forward with this. Maybe if you told me what you are trying to accomplish I can help you out with that.

        Comment

        • milo0615
          Member
          • Dec 2012
          • 39

          #5
          Hi koustavpal,

          Thank you for the quick reply. I am currently aligning two de novo plant genomes (wild and domesticated genomes) and I am using a related plant genome as reference. My goal is to analyse the genetic diversity and differentiation within and between the domesticated and wild plant. I have already completed the alignment using LASTZ and combined the MAF alignments using MULTIZ but I am confused on what would be my next step. What would you recommend? Should I continue with the LASTZ pipeline or do you have a better method in mind? I would really appreciate your help.


          Regards,

          -Milo

          Originally posted by koustavpal View Post
          Hi milo,

          I figured out the problem and got the entire pipeline working a long time ago, so it's a bit hard for me to remember how i did it. I'll try to help out as much as I can. So basically the first and only document I extensively referred to solve it was the UCSC 28way vertebrate alignment track documentation here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099589/

          Maybe you can start there if you haven't already.

          On a side note, this particular pipeline is a long and tedious one, so unless you are trying to do something which hasn't already been done I would strongly recommend against going forward with this. Maybe if you told me what you are trying to accomplish I can help you out with that.

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...