Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any nice scripts for showing contig length by frequency?

    Hiya,

    has anyone got a script that can nicely show contig length against frequency of contigs at that length?

    I've got a bunch of transcriptomes from different platforms and I'd like a nice way of presenting the differences. I could do something using wc -m and output to openoffice, but surely there is a more elegant way of doing it...

    otherwise, call me lazy and I'll get on with it!

    Cheers

  • #3
    Here are some other solutions on this thread: http://seqanswers.com/forums/showthread.php?t=15856

    Comment


    • #4
      Since SES pointed to the script I had posted before I thought I should provide the updated version of it. The version I posted in the previous thread was tailored for getting distributions of miRNAs; this version is more generalized. It will also accept either FASTA or FASTQ input. The script is written in Perl and requires modules Bio::SeqIO (from BioPerl) and Statistics:escriptive::Full. Command line options:

      Code:
      -i|--input   input file name (if none the script will accept input from STDIN)
      -f|--format  sequence file format <fasta|fastq> (default fasta)
      -s|--start   smallest contig length bin for distribution <n> (integer, default 10)
      -e|--end     largest contig length bin for distribution <n> (integer, default 1000)
      -b|--bin     size of each bin for distribution <n> (integer, default 20)
      Output is some summary statistics (total # of sequences, total sequence length, mean, median, etc.) followed by the contig length distribution data suitable for input to Open Office or Excel (or gnuplot if you wish to extend the script).
      Attached Files

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      37 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X