Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Quick additional question about maq2sam-long. I am finding that when I move maq alignments that include paired end data into sam format I loose the unique identifiers for each of the reads. Instead, both reads of the pair end up receiving the same name. This would be fine, except that when I then export the data for use with the Integrative Genomics Viewer I can't visualize the pairing information as I would like to.

    Is there a way to get maq's map files into SAM format while preserving the unique pair names and pairing information? Perhaps this stripping of unique names is appropriate and I am confused about the standard SAM format? Any help would be appreciated.

    Comment


    • #92
      Originally posted by wjeck View Post
      Quick additional question about maq2sam-long. I am finding that when I move maq alignments that include paired end data into sam format I loose the unique identifiers for each of the reads. Instead, both reads of the pair end up receiving the same name. This would be fine, except that when I then export the data for use with the Integrative Genomics Viewer I can't visualize the pairing information as I would like to.

      Is there a way to get maq's map files into SAM format while preserving the unique pair names and pairing information? Perhaps this stripping of unique names is appropriate and I am confused about the standard SAM format? Any help would be appreciated.
      If it is in proper SAM format, then there in the flag field there should be two bits that identify if it is "read1" or "read2" (see 2.2.2 http://samtools.sourceforge.net/SAM1.pdf). The latest trunk C version has a "-X" option in the "samtools view" command that can better display the flag field for you.

      Comment


      • #93
        Originally posted by nilshomer View Post
        If it is in proper SAM format, then there in the flag field there should be two bits that identify if it is "read1" or "read2" (see 2.2.2 http://samtools.sourceforge.net/SAM1.pdf). The latest trunk C version has a "-X" option in the "samtools view" command that can better display the flag field for you.
        This seems to be the crux of the problem, then. For whatever reason that information is not surviving the conversion by maq2sam-long. Specifically neither the 0x0040 nor 0x0080 is being tacked on to identify the first or second read in the pair

        I can hack a workaround, but I imagine I am not the only one who will run into this problem.

        Comment


        • #94
          Originally posted by wjeck View Post
          This seems to be the crux of the problem, then. For whatever reason that information is not surviving the conversion by maq2sam-long. Specifically neither the 0x0040 nor 0x0080 is being tacked on to identify the first or second read in the pair

          I can hack a workaround, but I imagine I am not the only one who will run into this problem.
          You could always move past MAQ, onto its successor BWA (same author). Most likely the SAM output of BWA is compliant (hl3 is the author of the C version of SAMtools, MAQ and BWA). There also other aligners out there that I believe support the SAM format, like Bowtie, SOAP or BFAST (my own admittedly).

          Comment


          • #95
            Originally posted by xguo View Post
            I got a list of candidate SNPs using BWA and samtools for RNASeq data, and am trying to weigh various filtering options. "samtools.pl varFilter" gives a list of filtering criterion with default setting. The maximum read depth is set at 100. Given that duplicate reads have been removed by "samtools rmdup", do I still need to limit the maximum read depth?
            Yes, you need this unless you are doing target sequencing in which case the read depth is expected to vary a lot.

            Originally posted by xguo
            The mapping quality is called RMS. Is it different from the maximum mapping quality used by maq?
            RMS=root mean square. It is more stable than maximum mapping quality in filtration.

            Originally posted by xguo
            Consensus and SNP quality are not included in samtools.pl varFilter. What do you think are reasonable thresholds for them?
            You should try it out. Different project may prefer different threshold.

            In maq, the average number of hits of reads covering the candidate SNP site is said to give the copy number of the flanking region in the reference genome. Is it still possible to get this value since BWA uses a different alignment algorithm than maq?
            You can only calculate by yourself. The number of occurrences is stored in the X0 tag which is specific to BWA.

            Comment


            • #96
              Originally posted by griffon42 View Post
              To add a bit to xguo's questions regarding the varFilter in samtools:

              I'm also struggling to understand the differences in varFilter relative to MAQ's SNPfilter. Using the exact same set of bowtie alignments (converted to either SAM format or MAQ format) I get dramatically different (3x) numbers of SNPs post-filtering step in samtools varFilter versus MAQ's SNPfilter.

              Can anyone (Heng?) shed some additional light on the differences in SNP calling policies employed by samtools varFilter?

              Thanks!
              The caller is pretty much the same, but the default rules used in filtering are slightly different. When I use maq and samtools to call SNPs on maq alignment, 99% of them are identical. What parameters are you using (in particular the maximum depth)?

              Comment


              • #97
                Originally posted by wjeck View Post
                This seems to be the crux of the problem, then. For whatever reason that information is not surviving the conversion by maq2sam-long. Specifically neither the 0x0040 nor 0x0080 is being tacked on to identify the first or second read in the pair

                I can hack a workaround, but I imagine I am not the only one who will run into this problem.
                Could you send a small example to me? If read names are ended up with "/[12]", they should be converted in principle.

                By the way, thank Nils very much for replying these questions. I am not here in the forum as often as before.

                Comment


                • #98
                  samtools rmdup

                  It looks like removal of duplicates for paired end reads is limited to cases when both reads map to the same chromosome. Is this correct? If yes, is there an alternative?

                  Comment


                  • #99
                    @ech

                    You may try picard. Its implementation of merge and rmdup are better.

                    Comment


                    • blat psl files and sam

                      is there a converter of blat psl files to sam format?

                      Comment


                      • SAM import core dump

                        Hi,

                        I created a sam file from BWA samse and am trying to import it into sam so I can view and sort alignments. I get the following core dump:

                        [sam_header_read2] 1 sequences loaded.
                        Parse error at line 262110: sequence and quality are inconsistent
                        /local/scratch/1250714001.6494533: line 8: 3753 Aborted (core dumped) ../samtools-0.1.5c_x86_64-linux/./samtools import ref_list.txt 15251.align.sam 15251

                        Also, there is no clear specification on this but what should ref_list.txt contain?

                        Comment


                        • Originally posted by samt View Post
                          Hi,

                          I created a sam file from BWA samse and am trying to import it into sam so I can view and sort alignments. I get the following core dump:

                          [sam_header_read2] 1 sequences loaded.
                          Parse error at line 262110: sequence and quality are inconsistent
                          /local/scratch/1250714001.6494533: line 8: 3753 Aborted (core dumped) ../samtools-0.1.5c_x86_64-linux/./samtools import ref_list.txt 15251.align.sam 15251

                          Also, there is no clear specification on this but what should ref_list.txt contain?
                          For the ref_list.txt, you can use the .fai file that is created when you index your reference FASTA file (see: samtools faidx).

                          As for the SAM format output of BWA, it could be the incorrect ref_list.txt, or the incorrect output of BWA (see lh3, who is the author).

                          Comment


                          • I am having a big problem and I would appreciatte the help of anybody,

                            Everytime I try to go from the SAM file to the BAM file I get an error similar to this.

                            [bwa_aln_core] convert to sequence coordinate... 0.09 sec
                            [bwa_aln_core] refine gapped alignments... 0.04 sec
                            [bwa_aln_core] print alignments... [sam_header_read2] 7719 sequences loaded.
                            [sam_read1] reference '2921' is recognized as '*'.
                            Parse error at line 164507: invalid CIGAR character
                            Aborted

                            THis is how my reference looks like:

                            >gi|148727254|ref|NM_015183.1
                            GTGCTGAAGTAGAGGTAGTACAGCATGGCTAGACTGTTGTGAGAGGCTCAGAGAAAGCAGAGGGTGAGATGGATGAGTCC
                            AGCATTCTAAGACGAAGAGGGCTCCAGAAGGAGCTGAGTCTCCCCAGAAGAGGAAGTTTGATAGATTCCCAGAAGTGGAA
                            TTGCTTGGTCAAACGTTGCCGAACAAGCAACCGGAAAAGCTTAATAGGCAATGGGCAGTCACCAGCATTGCCTCGACCAC
                            .
                            .
                            .
                            >gi|34303925|ref|NM_152268.2
                            GCGCGCTGGCCCGGCACGGCGGTGGTCTTGCGGGAGGCGTGGGCTGGGATTGCGGTGCCTGTGCTTCCCGGTGCCAGGGT
                            GTCATGGAAGGGCTGCTG...

                            and it keeps going like that....

                            My REF_LIST looks like this

                            gi|148727254|ref|NM_015183.1 7586 30 80 81
                            gi|34303925|ref|NM_152268.2 2364 7740 80 81
                            gi|187829199|ref|NM_031284.4 2565 10164 80 81
                            gi|57634540|ref|NM_004156.2 1807 12791 80 81


                            I created using samtools faidx

                            What could be happening? Thanks in advanced, i really need to get this fixed.

                            Comment


                            • I also got the following error while producing the SAM file

                              GAGACTNAGCACNCAACGGA -",883&,,:#/1-"6&2)-":#6=67*#9,9&/0&3)-"3+=4<-"&5
                              /mnt/scratch/awadalla/czuldieg/392Tmod/sources/I392T:35_18_1596 4 * 0 0 * * 0 0 NCCGGCATAGCTNTACCNTTAGACATAGCTCCGTCNCAGACNAACCCCA -"6;:<9>7=5<$-"4?>5-"67:&5=;8&6/0=2.7=-"37*3=-"1:
                              [bns_coor_pac2real] bug! Coordinate is longer than sequence (4294967295>=4929). Abort!
                              Aborted

                              Comment


                              • Originally posted by luisczul View Post
                                I am having a big problem and I would appreciatte the help of anybody,

                                Everytime I try to go from the SAM file to the BAM file I get an error similar to this.

                                [bwa_aln_core] convert to sequence coordinate... 0.09 sec
                                [bwa_aln_core] refine gapped alignments... 0.04 sec
                                [bwa_aln_core] print alignments... [sam_header_read2] 7719 sequences loaded.
                                [sam_read1] reference '2921' is recognized as '*'.
                                Parse error at line 164507: invalid CIGAR character
                                Aborted

                                THis is how my reference looks like:

                                >gi|148727254|ref|NM_015183.1
                                GTGCTGAAGTAGAGGTAGTACAGCATGGCTAGACTGTTGTGAGAGGCTCAGAGAAAGCAGAGGGTGAGATGGATGAGTCC
                                AGCATTCTAAGACGAAGAGGGCTCCAGAAGGAGCTGAGTCTCCCCAGAAGAGGAAGTTTGATAGATTCCCAGAAGTGGAA
                                TTGCTTGGTCAAACGTTGCCGAACAAGCAACCGGAAAAGCTTAATAGGCAATGGGCAGTCACCAGCATTGCCTCGACCAC
                                .
                                .
                                .
                                >gi|34303925|ref|NM_152268.2
                                GCGCGCTGGCCCGGCACGGCGGTGGTCTTGCGGGAGGCGTGGGCTGGGATTGCGGTGCCTGTGCTTCCCGGTGCCAGGGT
                                GTCATGGAAGGGCTGCTG...

                                and it keeps going like that....

                                My REF_LIST looks like this

                                gi|148727254|ref|NM_015183.1 7586 30 80 81
                                gi|34303925|ref|NM_152268.2 2364 7740 80 81
                                gi|187829199|ref|NM_031284.4 2565 10164 80 81
                                gi|57634540|ref|NM_004156.2 1807 12791 80 81


                                I created using samtools faidx

                                What could be happening? Thanks in advanced, i really need to get this fixed.
                                I also ran into the almost the same problem. I'm still stuck with it. I got sam from bwa samse. I created ref_list.fai file using samtools faidx on ref.fa. However errors popped up as follows:

                                [sam_header_read2] 24 sequences loaded.
                                Parse error at line 1: invalid CIGAR character

                                Could anyone help me fix this problem? Thanks a lot.

                                The content of my ref_list.fai is as follows:
                                gi|224384768|gb|CM000663.1| 249250621 87 70 71
                                gi|224384767|gb|CM000664.1| 243199373 252811519 70 71
                                gi|224384766|gb|CM000665.1| 198022430 499485256 70 71
                                gi|224384765|gb|CM000666.1| 191154276 700336665 70 71
                                gi|224384764|gb|CM000667.1| 180915260 894221804 70 71
                                gi|224384763|gb|CM000668.1| 171115067 1077721655 70 71
                                gi|224384762|gb|CM000669.1| 159138663 1251281310 70 71
                                gi|224384761|gb|CM000670.1| 146364022 1412693470 70 71
                                gi|224384760|gb|CM000671.1| 141213431 1561148494 70 71
                                gi|224384759|gb|CM000672.1| 135534747 1704379348 70 71
                                gi|224384758|gb|CM000673.1| 135006516 1841850394 70 71
                                gi|224384757|gb|CM000674.1| 133851895 1978785663 70 71
                                gi|224384756|gb|CM000675.1| 115169878 2114549816 70 71
                                gi|224384755|gb|CM000676.1| 107349540 2231365066 70 71
                                gi|224384754|gb|CM000677.1| 102531392 2340248259 70 71
                                gi|224384753|gb|CM000678.1| 90354753 2444244474 70 71
                                gi|224384752|gb|CM000679.1| 81195210 2535890098 70 71
                                gi|224384751|gb|CM000680.1| 78077248 2618245328 70 71
                                gi|224384750|gb|CM000681.1| 59128983 2697438054 70 71
                                gi|224384749|gb|CM000682.1| 63025520 2757411825 70 71
                                gi|224384748|gb|CM000683.1| 48129895 2821337798 70 71
                                gi|224384747|gb|CM000684.1| 51304566 2870155351 70 71
                                gi|224384746|gb|CM000685.1| 155270560 2922192927 70 71
                                gi|224384745|gb|CM000686.1| 59373566 3079681725 70 71

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X