Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Anelda
    Member
    • May 2010
    • 30

    454AlignmentInfo.tsv correlation to contigs

    Hi all,

    I have done reference mapping using gsMapper. I want to calculate depth of coverage for various contigs obtained. According to documentation I can look at 454AlignmentInfo.tsv. But I find 594 headerlines in this file as opposed to 210 contigs in 454AllContigs.fna.

    According to documentation I expected both files to have the same number of contigs?

    Please can someone help to figure out how to determine depth of coverage from the gs output?

    Thanks in advance!

    Anelda
  • flxlex
    Moderator
    • Nov 2008
    • 412

    #2
    The difference is caused by the fact that the lower length limit of the 454AllContigs.fna file is 100 bp, while there are alignment stretches in the 454AlignmentInfo.tsv that are shorter than 100 bp. You could try the newbler alignment with '-a 0' (setting the minimum length for the the 454AllContigs.fna file to 0). In this way, all alignment stretches are reported.

    Comment

    • Anelda
      Member
      • May 2010
      • 30

      #3
      Thanks for your answer. I'm just wondering about the following too.. When I look at the two files mentioned above, the .fna isn't a subset of the .tsv. I can't correlate entries in the .tsv to contigs in the .fna - start and end coordinates overlap various contigs. I expected it to report only a subset of the alignments found in the AllContigs.fna based on setting the lower limit for contig lenght at 100? Can you maybe explain the relationship between these two files? And how can I calculate the coverage for the reported contigs in 454AllContigs.fna if I only want the >100 contigs?

      Comment

      • Anelda
        Member
        • May 2010
        • 30

        #4
        Originally posted by flxlex View Post
        The difference is caused by the fact that the lower length limit of the 454AllContigs.fna file is 100 bp, while there are alignment stretches in the 454AlignmentInfo.tsv that are shorter than 100 bp. You could try the newbler alignment with '-a 0' (setting the minimum length for the the 454AllContigs.fna file to 0). In this way, all alignment stretches are reported.
        Hi there,

        I just ran the software again using -a0. The results are again not what I expect. I get more contigs in the 454AllContigs.fna file, but one less header in the 454AlignmentInfo.tsv.
        With -a 100: Contigs=210; Alignments=594
        With -a 0 : Contigs=267 Alignments=593

        Any ideas about what's going on?

        Thanks!

        Comment

        • flxlex
          Moderator
          • Nov 2008
          • 412

          #5
          If you check the lengths of the contigs in the 454AllContigs.fna file, are there indeed smaller ones?

          Code:
          grep '>' 454AllContigs.fna
          You can compare the contig names and lengths with the lengths of the alignments by extracting this info using this awk command:

          Code:
          awk 'NR>1 {if(/>/){print c"\t"l;c=$1}else {l=$1}}' 454AlignmentInfo.tsv
          This will list contig name and length of the alignment. There will be small differences with the length in the AllContigs.fna, but those are probably small indels.

          Comment

          • Anelda
            Member
            • May 2010
            • 30

            #6
            There are contigs of size 50, 91 etc in the AllContigs.fna.

            When I run your awk command, I get the following:
            <cut>
            >gi|269213353|ref|NZ_GG729831.1| 763109
            >gi|269213353|ref|NZ_GG729831.1| 763729
            >gi|269213353|ref|NZ_GG729831.1| 769339
            >gi|269213353|ref|NZ_GG729831.1| 771418
            >gi|269213353|ref|NZ_GG729831.1| 771589
            >gi|269213353|ref|NZ_GG729831.1| 773766
            >gi|269213353|ref|NZ_GG729831.1| 781018
            >gi|269213353|ref|NZ_GG729831.1| 781518
            >gi|269213353|ref|NZ_GG729831.1| 797325
            >gi|269213353|ref|NZ_GG729831.1| 801751
            >gi|269213353|ref|NZ_GG729831.1| 809673
            >gi|269213353|ref|NZ_GG729831.1| 817641
            >gi|269213353|ref|NZ_GG729831.1| 819096
            >gi|269213353|ref|NZ_GG729831.1| 819131
            >gi|269213353|ref|NZ_GG729831.1| 819538
            >gi|269213353|ref|NZ_GG729831.1| 823520
            >gi|269213353|ref|NZ_GG729831.1| 827505
            >gi|269213353|ref|NZ_GG729831.1| 832562
            >gi|269213353|ref|NZ_GG729831.1| 877197
            >gi|269213353|ref|NZ_GG729831.1| 878796
            >gi|269213353|ref|NZ_GG729831.1| 879406
            >gi|269213353|ref|NZ_GG729831.1| 880657
            <cut>

            This does not reflect the lengths of the contigs - my longest contig in AllContigs is 104,444nt. The contig names also don't correspond to the names given in the AllContigs file? Am I doing something wrong?

            I have run the software from the commandline as well as GUI with same results.

            Thanks so much for your help!

            Comment

            • flxlex
              Moderator
              • Nov 2008
              • 412

              #7
              I would expect the headers of the 454AllContigs.fna file to look something like this:

              Code:
              >contig00003  gi|269213353|ref|NZ_GG729831.1|, 1..1270  length=1293   numreads=741
              This reflects the contig (contig00003) built based on the reads aligned to the reference (gi|269213353|ref|NZ_GG729831.1|) from position 1 to 1270, with the contig length 1293 bp (I guess because of an insert in the mapped reads relative to the reference)

              Is that what you see?

              Comment

              • Anelda
                Member
                • May 2010
                • 30

                #8
                When I run the awk command on 454AlignmentInfo.tsv I get this (first 4 lines):

                >gi|269213352|ref|NZ_GG729830.1| 2015
                >gi|269213352|ref|NZ_GG729830.1| 2592
                >gi|269213352|ref|NZ_GG729830.1| 2778
                >gi|269213352|ref|NZ_GG729830.1| 3619

                When I run grep ">" on 454AllContigs.fna I get this (first 4 lines):

                >contig00001 gi|269213352|ref|NZ_GG729830.1|, 585..2015 length=1432 numreads=341
                >contig00002 gi|269213352|ref|NZ_GG729830.1|, 2255..2646 length=393 numreads=17
                >contig00003 gi|269213352|ref|NZ_GG729830.1|, 2687..2789 length=103 numreads=2
                >contig00004 gi|269213352|ref|NZ_GG729830.1|, 3410..3619 length=210 numreads=70

                I.e. in alignmentInfo.tsv there is no contig name? And the number of alignments do not correspond to the number of contigs in allContigs.fna?

                Thanks!

                Comment

                • flxlex
                  Moderator
                  • Nov 2008
                  • 412

                  #9
                  Originally posted by Anelda View Post
                  When I run the awk command on 454AlignmentInfo.tsv I get this (first 4 lines):

                  >gi|269213352|ref|NZ_GG729830.1| 2015
                  >gi|269213352|ref|NZ_GG729830.1| 2592
                  >gi|269213352|ref|NZ_GG729830.1| 2778
                  >gi|269213352|ref|NZ_GG729830.1| 3619
                  Oops. The awk command will give you the reference name, and the last base of the alignment section (so, not the length of the alignment).

                  When I run grep ">" on 454AllContigs.fna I get this (first 4 lines):

                  >contig00001 gi|269213352|ref|NZ_GG729830.1|, 585..2015 length=1432 numreads=341
                  >contig00002 gi|269213352|ref|NZ_GG729830.1|, 2255..2646 length=393 numreads=17
                  >contig00003 gi|269213352|ref|NZ_GG729830.1|, 2687..2789 length=103 numreads=2
                  >contig00004 gi|269213352|ref|NZ_GG729830.1|, 3410..3619 length=210 numreads=70
                  So, you see that the ranges (585..2015, 2255..2646 etc) roughly correspond to the last bases from the awk script, but not perfectly (don't know why).

                  I.e. in alignmentInfo.tsv there is no contig name? And the number of alignments do not correspond to the number of contigs in allContigs.fna?
                  Correct on the first one. Apparently on the second one. I have looked at one of my own assemblies and saw the samw difference, and it does not make sense. Perhaps time to contact your Roche rep? This could indicate a bug or something we are missing...

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  10 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  45 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  105 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  125 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...