I am new in Pac-Bio seq analysis and doing some initial test run in Falcon.
I have a downloaded dataset (Pac-Bio RSI reads) from SRA, and converted it to *.fastq, it looks like this:
@SRR1168519.1 length=302
ATTTTTGTCTGTCCGATTCTGATAGCAGGC
GCATATCAGATGAATCTGATGAGTCAACACTGGTTGGTTCGTTGCTCAGTAGTTATGTTCGTGTGGAGCGTCGTATTGGTATCGAGTCTGATTGTCAGTCATCGATGGTCATTAGTCACGTCCTTCCAGTAGTTCGTATCAACATGCTTCACTATTCTTGTTGTTGTAGATGTTATTCGTATTAGTGTGAGTGTCAGTAGTTACGCGTACAGTATCGGGATTTCGTAGCAGCGCGCGGCGTTGCGGAGTCAAGATTCATGGCTGGACTACGG
+SRR1168519.1 length=302
!"!!!"#$"##!!!"!!"!"#""""#$#"!"!""!!!!!""%"""!"!"#""!#"!!!"!#"!#!!!"!!!"""!!!!"""#!!"#"!"!""!"!!!!""#!!!""!!!"!#!"###"#""!"!!!##!#!#!"!"""!"$$!!"#"$""#"!!"!!#"!!#!!!"!"""!!""%#"$#"$"#"!!!"!!!!!"!!!"!"!"!$#%&%%$"""""""!#"!"!!""##"$!!!!!!!$$!!!!!#!!"!!!!%!"$"!!"""!!!!!!!"!!!!!!$$#"!"!!!"!$$#"!$!!!""!"""
Even after using Falcon-formatter for fasta format conversion, it does NOT work in Falcon.
And I know that the fasta files require strict formatting with the information of movie, time of run start, SMRT barcode, etc. and should look like this:
>m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058 RQ=0.831
TGGCATCTCATAAAGCCGCGCGGACGGGCAATAGCACTGGTTCGATTGTCTGGTGTTTATTCCCGGCTGT
TGGGCTGAGTTTGTGATCCCGGTGAACTTCTCGCATGCCGACAGCATCATGATCGGTGCGCTGTCTCCCT
GGCAAATAGAAGTTGTTCAATAACGCGCGCGACTGGCCGTTGGCCTCGGGCGGTTAGCGATGCATCGATG
TTTGCTGGGCTGCTAATTGTGCCCGATAATATGGTTGGTTCGGCACTAAACGACCAGCAAAAAAAAGCGT
GGGAGAACAGATGAAATTATTTACGCGGTAGTTCGTTTCGCCGCTGGCGGATTGTGATTTTGCTGGCTTG
GTCTTACCGTTTTCCTCTACGCGGCCCAATGCTGAGCTGGGTATCTATTCGTTATACGGCTCTGAAGGCT
My question:
1. Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?
2. Or is there another way to recover *.sra file I downloaded to make it work properly in Falcon? I downloaded *.sra and then do fastq-dump.
Does this process lead to the loss of the head information such as "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9"?
Looking forward to your help!
Many thanks!
I have a downloaded dataset (Pac-Bio RSI reads) from SRA, and converted it to *.fastq, it looks like this:
@SRR1168519.1 length=302
ATTTTTGTCTGTCCGATTCTGATAGCAGGC
GCATATCAGATGAATCTGATGAGTCAACACTGGTTGGTTCGTTGCTCAGTAGTTATGTTCGTGTGGAGCGTCGTATTGGTATCGAGTCTGATTGTCAGTCATCGATGGTCATTAGTCACGTCCTTCCAGTAGTTCGTATCAACATGCTTCACTATTCTTGTTGTTGTAGATGTTATTCGTATTAGTGTGAGTGTCAGTAGTTACGCGTACAGTATCGGGATTTCGTAGCAGCGCGCGGCGTTGCGGAGTCAAGATTCATGGCTGGACTACGG
+SRR1168519.1 length=302
!"!!!"#$"##!!!"!!"!"#""""#$#"!"!""!!!!!""%"""!"!"#""!#"!!!"!#"!#!!!"!!!"""!!!!"""#!!"#"!"!""!"!!!!""#!!!""!!!"!#!"###"#""!"!!!##!#!#!"!"""!"$$!!"#"$""#"!!"!!#"!!#!!!"!"""!!""%#"$#"$"#"!!!"!!!!!"!!!"!"!"!$#%&%%$"""""""!#"!"!!""##"$!!!!!!!$$!!!!!#!!"!!!!%!"$"!!"""!!!!!!!"!!!!!!$$#"!"!!!"!$$#"!$!!!""!"""
Even after using Falcon-formatter for fasta format conversion, it does NOT work in Falcon.
And I know that the fasta files require strict formatting with the information of movie, time of run start, SMRT barcode, etc. and should look like this:
>m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058 RQ=0.831
TGGCATCTCATAAAGCCGCGCGGACGGGCAATAGCACTGGTTCGATTGTCTGGTGTTTATTCCCGGCTGT
TGGGCTGAGTTTGTGATCCCGGTGAACTTCTCGCATGCCGACAGCATCATGATCGGTGCGCTGTCTCCCT
GGCAAATAGAAGTTGTTCAATAACGCGCGCGACTGGCCGTTGGCCTCGGGCGGTTAGCGATGCATCGATG
TTTGCTGGGCTGCTAATTGTGCCCGATAATATGGTTGGTTCGGCACTAAACGACCAGCAAAAAAAAGCGT
GGGAGAACAGATGAAATTATTTACGCGGTAGTTCGTTTCGCCGCTGGCGGATTGTGATTTTGCTGGCTTG
GTCTTACCGTTTTCCTCTACGCGGCCCAATGCTGAGCTGGGTATCTATTCGTTATACGGCTCTGAAGGCT
My question:
1. Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?
2. Or is there another way to recover *.sra file I downloaded to make it work properly in Falcon? I downloaded *.sra and then do fastq-dump.
Does this process lead to the loss of the head information such as "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9"?
Looking forward to your help!
Many thanks!
Comment