Hi,
I'm using Rockhopper to analyze E. coli RNA-Seq data.
I'm not familiar with the SAM format outputted by Rockhopper.
Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
The data is paired-end.
Here is the first fourteen lines from the SAM file.
I've put more lines in the attached file.
Thank you for your help.
I'm using Rockhopper to analyze E. coli RNA-Seq data.
I'm not familiar with the SAM format outputted by Rockhopper.
Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
The data is paired-end.
Here is the first fourteen lines from the SAM file.
I've put more lines in the attached file.
Code:
[blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more @HD VN:1.0 SO:unsorted @SQ SN:gi|556503834|ref|NC_000913.3| LN:4641652 SP:Escherichia coli str. K-12 substr. MG1655 @PG ID:Rockhopper PN:Rockhopper VN:2.03 D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 2527763 255 50M = 2527927 213 TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII + 131 gi|556503834|ref|NC_000913.3| 2527927 255 49M = 2527763 -213 CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3441734 255 50M = 3441811 126 CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F + 131 gi|556503834|ref|NC_000913.3| 3441811 255 49M = 3441734 -126 GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3471221 255 50M = 3471324 152 CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII + 131 gi|556503834|ref|NC_000913.3| 3471324 255 49M = 3471221 -152 GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 1719838 255 50M = 1719872 83 AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB + 179 gi|556503834|ref|NC_000913.3| 1719872 255 49M = 1719838 -83 GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3444439 255 50M = 3444490 100 CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B + 131 gi|556503834|ref|NC_000913.3| 3444490 255 49M = 3444439 -100 TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 639393 255 50M = 639501 157 GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT <BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB< + 179 gi|556503834|ref|NC_000913.3| 639501 255 49M = 639393 -157 GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF
Comment