Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    Originally posted by seqsyd View Post
    can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks
    Have you tried the provided Windows binaries on sourceforge? i.e. samtools-0.1.12a_i386-win32.zip

    [I'd suggest getting hold of a Linux (or Mac OS X) machine if you can, since not that many of the NGS tools will work on Windows]

    Comment

    • ElMichael
      Member
      • Jun 2009
      • 31

      optional fields in SAM format

      Could anybody advise me the best way to parse optional fields in SAM file? I'm interested in tags X0 and XA, and the problem is that they do not have a permanent place in the output string (e.g., sometimes 13th or 14th, 18th, 19th or 20th). Certainly, I can do it myself in perl, but such parsing would take too much time and resources.
      What I'm looking for is:
      1) either a way to control output tags (their number and order)
      2) or (better) some bioperl module, like blast parser, that could parse it effectively.
      thanks!

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        For perl, very easy and efficient:

        perl -ne 'print "$1\n" if /XA:i\d+)/'

        Comment

        • ElMichael
          Member
          • Jun 2009
          • 31

          Originally posted by lh3 View Post
          For perl, very easy and efficient:

          perl -ne 'print "$1\n" if /XA:i\d+)/'
          It's exactly what I want to avoid - checking each line with IF condition.

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            This is fast. If present, the ordering of tags are fixed. You can

            if /X0:i\d+).*XA:Z\S+)/

            to grep out everything with a single pass. You need to read and split after all. Regex matching should be of similar speed, if not faster. The bad implementation is to loop through the optional tags and then try to find matching.
            Last edited by lh3; 04-15-2011, 06:59 AM.

            Comment

            • ElMichael
              Member
              • Jun 2009
              • 31

              Thank you, Heng!

              Comment

              • Strand SI
                The Avadis NGS Team
                • Feb 2011
                • 26

                I converted some ELAND export files to SAM and now tried to make BAM files out of those, but it is complaining about @SQ headers not being there?!
                Does the export2sam.pl tool not create a SAM file format that is correct??

                Comment

                • oudacontrol
                  Junior Member
                  • Jan 2011
                  • 3

                  I am trying to view a portion of a sorted, indexed bam file and I am getting the error:
                  [sam_header_read2] 26 sequences loaded.
                  [main_samview] random alignment retrieval only works for indexed BAM files.

                  It is already sorted and indexed.... I have no idea how to resolve this.

                  Comment

                  • jnfass
                    Member
                    • Aug 2008
                    • 88

                    @oudacontrol: You should post the command you used. It seems like the command may have been mis-formatted, but there's no way to tell if you don't include it.

                    Comment

                    • Bruins
                      Member
                      • Feb 2010
                      • 78

                      Dear all,

                      I'm had the same problem as oudacontrol above and Jeckow earlier in this thread.

                      Trying to get samtools view to extract one chromosome from a sorted and indexed bam file fails:
                      Code:
                      [sam_header_read2] 84 sequences loaded.
                      [main_samview] random alignment retrieval only works for indexed BAM files.
                      It took me a long while to figure out what caused it, so I thought I'd post my simple solution for poor sobs like me...

                      Code:
                      samtools sort in.bam in.sorted
                      Code:
                      samtools index in.sorted.bam
                      Code:
                      samtools view -bh -t human_g1k_v37.fa.fai -o in.sorted.chr9.bam in.sorted.bam 9
                      As it turned out, I should not have used the -t option. Without it, view finished as it should have, leaving me with a nice chr9-only bam file! (at least, it's a 279M file, I'll have to check later if trackster or igv will load it )

                      Cheers,
                      Bruins

                      Comment

                      • Hena
                        Member
                        • Nov 2009
                        • 19

                        Does anyone have nice perl snippet to calculate the end point of the read in a sam file?

                        Comment

                        • pengchy
                          Senior Member
                          • Feb 2009
                          • 116

                          maq2sam-long problems

                          Dear all,

                          Originally posted by xmluo View Post
                          I used maq2sam-long to convert maq output to sam format, but all pairing information is missed in the results: MRNM is "*", and MPOS and ISIZE are 0. Can you recommend how to get these information? Thanks, Mei

                          I run into the same questions.

                          Another question is about the statistics for the results of out.map and out.bam.

                          1. how can I calculate the mapping rate?
                          95%(1903176 / 2000000) from maq or 95.68%(1820937 / 1903176) from the converted bam?

                          2. how can I calculate the unique mapped pair-end reads number?
                          I have conceived a method: according to the sam format spcification, the TAG "H0:i:1 H1:i:0", give the Number of perfect hits (H0) and Number of 1-di erence hits (H1), so we can deduce that one pair is uniquely mapped if and only if any one end's H0:i:1 or "H0:i:0 and H1:i:1", can it work?

                          3. how to get the MRNM, MPOS and ISIZE?
                          In my case, the ISIZE is normal. I found that the flag value in the second column is normal too. The mate read also given at another line, so we can fill the MRNM and MPOS columns accordingly. can it work?

                          Code:
                          ##suppose we got the output file of "maq map" is out.map. then
                          ##got out.sm
                          samtools-0.1.18/misc/maq2sam-long out.map  > out.sam
                          ##got the sorted bam file out.bam
                          samtools-0.1.18/samtools view -ut reference.fa.fai out.sam | samtools  sort - out
                          ##got the statistics results
                          ##maq
                          maq-0.7.1/maq.pl statmap nohup.out > out.map.stat
                          samtools flagstat out.bam > out.bam.stat
                          
                          -bash-3.2$ more out.map.stat 
                          
                          -- == statmap report ==
                          
                          -- # single end (SE) reads: 0
                          -- # mapped SE reads: 0 (/ 0 = NA%)
                          -- # paired end (PE) reads: 2000000
                          -- # mapped PE reads: 1903176 (/ 2000000 = 95.15%)
                          -- # reads that are mapped in pairs: 1756127 (/ 1903176 = 92.27%)
                          -- # Q>=30 reads that are moved to meet mate-pair requirement: 2962 (/ 1756127 = 0.16%)
                          -- # Q<30 reads that are moved to meet mate-pair requirement: 203102 (11.56%)
                          
                          -bash-3.2$ more out.bam.stat 
                          1903176 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          1820937 + 0 mapped (95.68%:nan%)
                          1903176 + 0 paired in sequencing
                          951588 + 0 read1
                          951588 + 0 read2
                          1673888 + 0 properly paired (87.95%:nan%)
                          1738698 + 0 with itself and mate mapped
                          82239 + 0 singletons (4.32%:nan%)
                          1738698 + 0 with mate mapped to a different chr
                          1471758 + 0 with mate mapped to a different chr (mapQ>=5)
                          Any suggestions are appreciated, Thank you all.

                          pengchy

                          Comment

                          • pengchy
                            Senior Member
                            • Feb 2009
                            • 116

                            Hi,
                            I have found the solution.
                            1. the mapping rate is 1820937/2000000

                            2. the unique mapped reads can be got from:
                            for the single end: "H0:i:1 H1:i:A" or "H0:i:0 H1:i:1" or "H0:i:0 H1:i:0" ##here, "A" stand for any number
                            for the pair end: any end has above tags.

                            3. Yes, it is.
                            Last edited by pengchy; 10-21-2011, 05:51 PM. Reason: a mistake

                            Comment

                            • dtc
                              Junior Member
                              • Oct 2012
                              • 3

                              I have the same problem now, do you have any solution now?
                              Thanks Alexey

                              Comment

                              • Hena
                                Member
                                • Nov 2009
                                • 19

                                Originally posted by dtc View Post
                                I have the same problem now, do you have any solution now?
                                Thanks Alexey
                                I did my own version of this in the end using CIGAR string. However it does not take into account 'X', '=', 'N' or 'P' tags.

                                Code:
                                sub last_pos {
                                  my @read = @_;
                                  local $_;
                                  my ($match,$del) = (0,0);
                                  while ($read[$CIGAR] =~ m/(\d+)M/g) {
                                    $match += $1;
                                  }
                                  while ($read[$CIGAR] =~ m/(\d+)D/g) {
                                    $del += $1;
                                  }
                                  return $read[$POS] + $match + $del -1;
                                }
                                So counting the match/mismatch and deletion tags should give the number of nucleotides it will span in reference.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM
                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                39 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                46 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-04-2026, 08:59 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...