Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unfamiliar SAM file format outputted by Rockhopper program

    Hi,

    I'm using Rockhopper to analyze E. coli RNA-Seq data.
    rockhopper, rna-seq, rnaseq, analysis, bacteria, bacterial, bioinformatics

    I'm not familiar with the SAM format outputted by Rockhopper.
    Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
    The data is paired-end.

    Here is the first fourteen lines from the SAM file.
    I've put more lines in the attached file.

    Code:
    [blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more
    @HD	VN:1.0	SO:unsorted
    @SQ	SN:gi|556503834|ref|NC_000913.3|	LN:4641652	SP:Escherichia coli str. K-12 substr. MG1655
    @PG	ID:Rockhopper	PN:Rockhopper	VN:2.03
    D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	2527763	255	50M	=	2527927	213	TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC	BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII
    +	131	gi|556503834|ref|NC_000913.3|	2527927	255	49M	=	2527763	-213	CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT	BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF
    D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3441734	255	50M	=	3441811	126	CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG	BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F
    +	131	gi|556503834|ref|NC_000913.3|	3441811	255	49M	=	3441734	-126	GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT	BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB
    D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3471221	255	50M	=	3471324	152	CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG	BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII
    +	131	gi|556503834|ref|NC_000913.3|	3471324	255	49M	=	3471221	-152	GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT	BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII
    D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	1719838	255	50M	=	1719872	83	AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA	BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB
    +	179	gi|556503834|ref|NC_000913.3|	1719872	255	49M	=	1719838	-83	GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG	BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB
    D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3444439	255	50M	=	3444490	100	CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC	BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B
    +	131	gi|556503834|ref|NC_000913.3|	3444490	255	49M	=	3444439	-100	TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA	BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B
    D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	639393	255	50M	=	639501	157	GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT	<BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB<
    +	179	gi|556503834|ref|NC_000913.3|	639501	255	49M	=	639393	-157	GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG	B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF
    Thank you for your help.
    Attached Files
    Last edited by blancha; 07-09-2015, 04:11 PM. Reason: Put lines from SAM file in Code box

  • #2
    It mostly looks like a normal sam file; the specification is here: https://samtools.github.io/hts-specs/SAMv1.pdf

    However, the second line has "+" for the read name, which is odd to say the least. Can you run head on the input fastq file to show the first 8 lines?

    Edit - looking at the attachment, it appears that either you have an odd fastq file with read2 always named "+" or that Rockhopper has a bug causing it to incorrectly report the read name.

    Comment


    • #3
      Thank you Brian.
      You are correct in pointing out that the only problem with the format is the + sign on every other line.
      The + just corresponds to the paired FASTQ read.
      If this was the only issue I had with Rockhopper, I would be happy.

      The main problem I have is that when I view the alignments in IGV, at least half the reads are mostly composed of mutations relative to the reference genome.
      I've tried all the different settings, fr, ff, rf, and rr.
      I cannot figure out why Rockhopper insists on aligning reads in what appears to be the wrong location.

      I think I'll just give up on the software, even if it appears to be widely used in respected publications for E. coli RNA-Seq analysis.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-14-2024, 07:03 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-10-2024, 06:35 AM
      0 responses
      44 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X