Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • singeltons+contigs for 454 data

    Hi,
    I am interested to create one file which will include the singeltons and contigs together, Is there any way to create such a file using 454 softwares or do I have to have a script which extracts the names of the singeltons from the readstatus file and then extracts the sequences from the fasta file and adds them to the contigs file?
    Thanks alot!!!

  • #2
    I don't know of any script. I am replying to remind you that 454 sometimes uses PARTS of a read (the "left" end) and then puts the "right" end in 454ReadStatus.txt with the name "XXXXXX_right Singleton", so you'll need to think about what you want to do with those. eg.

    % grep Singleton 454ReadStatus.txt
    GHFU8EI02CHPMJ_left Singleton
    GHFU8EI02B9FPY Singleton
    GHFU8EI02CJNN4_right Singleton
    GHFU8EI02CA0E9 Singleton


    To get the .FASTA sequences from the .SFF file, you'll need to use "sffinfo":

    % sffinfo -seq file.sff > file.fasta

    Also, if you did paired end sequencing, the 454Scaffolds.fna file does NOT CONTAIN those contigs in 454Contigs.fna which failed to scaffold.

    Comment


    • #3
      I thought that _left and _right should arise from paired end reads and not from split reads.

      Basically what I do is to:

      1) Grab the reads of choice from the 454ReadStatus.txt file and, optionally, the 454TrimStatus file

      2) Use sfffile to create a temporary sff file with just those reads

      3) Use sffinfo to extract the sequences.

      The rough steps are:

      fgrep '\tSingleton' 454ReadStatus.txt > /tmp/Singleton.tmp

      sfffile -o /tmp/Singleton.sff /tmp/Singleton.tmp mysff.sff

      sffinfo -s /tmp/Singleton.sff > Singleton.tfa

      Comment


      • #4
        Originally posted by Torst View Post
        Also, if you did paired end sequencing, the 454Scaffolds.fna file does NOT CONTAIN those contigs in 454Contigs.fna which failed to scaffold.
        That is not entirely true: 454Scaffolds.txt contains the scaffolds (at least two contigs with gap(s)) AND all unscaffolded contigs of at least 2kb. IMO they shouldn't have done that, but rather outputted a separate unscaffolded-contig file...

        Originally posted by westerman View Post
        sfffile -o /tmp/Singleton.sff /tmp/Singleton.tmp mysff.sff
        I guess you mean

        Code:
        sfffile -o /tmp/Singleton.sff -i /tmp/Singleton.tmp mysff.sff
        (note the '-i')

        Comment


        • #5
          flxlex,

          Originally posted by flxlex View Post
          That is not entirely true: 454Scaffolds.txt contains the scaffolds (at least two contigs with gap(s)) AND all unscaffolded contigs of at least 2kb. IMO they shouldn't have done that, but rather outputted a separate unscaffolded-contig file...
          Hmm, it appears you are correct. Thank you for replying! I had not noticed the "1 contig scaffolds" because, like you said, it is inconsistent and they get renamed to "scaffoldNNNNNN" ... but yes, when I examine 454Scaffolds.txt I can see many scaffolds which are made up of 1 contig only.. I find it hard to accept they would use a different threshold for "contigs becoming scaffolds" and "large contigs", and NOT output the separate unscaffolded contigs file too.

          Also, you suggest the cut-off is 2kbp, but in my example 10 of the 22 contigs are between 1356bp and 1870bp, which suggests maybe the cutoff is 1kbp?

          Either way - thank you muchly for catching my error!

          Comment


          • #6
            Originally posted by flxlex View Post

            I guess you mean

            Code:
            sfffile -o /tmp/Singleton.sff -i /tmp/Singleton.tmp mysff.sff
            (note the '-i')
            Yes, that is what I get for pulling the code out of a script that I use instead of typing it in directly. Thanks for the correction.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Exploring the Dynamics of the Tumor Microenvironment
              by seqadmin




              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
              07-08-2024, 03:19 PM
            • seqadmin
              Exploring Human Diversity Through Large-Scale Omics
              by seqadmin


              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
              06-25-2024, 06:43 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 07-16-2024, 05:49 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-15-2024, 06:53 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-10-2024, 07:30 AM
            0 responses
            40 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 09:45 AM
            0 responses
            205 views
            0 likes
            Last Post seqadmin  
            Working...
            X