Hi, this is my first post!
I've downloaded some ChIP-Seq data from the SRA ( http://www.ncbi.nlm.nih.gov/sra/SRX000425?report=full# ). The originating paper ( http://www.ncbi.nlm.nih.gov/pubmed/18477713 ) says that the reads are SOLiD paired-end, with 25 bp from each end, but the reads themselves are 52 bases long! It seems that the two ends have been ligated and sequenced together.
I am wondering where the extra two bases come from, but the larger problem is that all the alignment programs out there seem to expect that when you have paired-end reads that you'll have TWO lists of reads, one for each half of the pair. (There are two archives attached to this SRA account, but I'm fairly certain that they're not paired. When you add the number of reads they come to the number of reads reported in the paper, and the reported number of matches is more than half that number. besides, the read id's don't appear to match, and the reads are too long.)
What tools can I use to map these reads to a reference genome?
for reference, here are some sample reads:
first file:
second file
(all the reads start with T?)
I've downloaded some ChIP-Seq data from the SRA ( http://www.ncbi.nlm.nih.gov/sra/SRX000425?report=full# ). The originating paper ( http://www.ncbi.nlm.nih.gov/pubmed/18477713 ) says that the reads are SOLiD paired-end, with 25 bp from each end, but the reads themselves are 52 bases long! It seems that the two ends have been ligated and sequenced together.
I am wondering where the extra two bases come from, but the larger problem is that all the alignment programs out there seem to expect that when you have paired-end reads that you'll have TWO lists of reads, one for each half of the pair. (There are two archives attached to this SRA account, but I'm fairly certain that they're not paired. When you add the number of reads they come to the number of reads reported in the paper, and the reported number of matches is more than half that number. besides, the read id's don't appear to match, and the reads are too long.)
What tools can I use to map these reads to a reference genome?
for reference, here are some sample reads:
first file:
Code:
@SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 T32322133300002330031001022230020232002203222030231 +SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50 !21(()+%'+%40*.%%**)&%&*&%%%&%%%%%%%%%%%%%%%(+%%%%' @SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50 T01212120333223322020022322232232232222022232033230 +SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50 !,*+*()+*(%'+)%%%&%+&%%'%%%%%%%%%%%%%%%%%%%%'+%%%%% @SRR015241.3 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_369_F3 length=50 T32023002222000323202022222323322200222200220003032 +SRR015241.3 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_369_F3 length=50 !(*)%%%+'%%%*%%%%&%%%%%%%%%%%%%%%%%%%%%%%%%%%+%%%%% @SRR015241.4 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_487_F3 length=50 T32021200310022332200020032222332303202203222030030 +SRR015241.4 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_487_F3 length=50 !9)'+*)4')%&&%)%%('&%%'%'%%%%%%%)%%%%%%%%%%%%+%%%%(
Code:
@SRR015242.1 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_30_F3 length=50 T03231223000321133333031113002130221200322111211011 +SRR015242.1 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_30_F3 length=50 !:9<3:99<*8;8<)0<;<%-8;2%%3*5%*.8<,1;6;*%&..'%%-*,% @SRR015242.2 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_68_F3 length=50 T01032120003210102101003202002003021300310100313323 +SRR015242.2 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_68_F3 length=50 !<*7-;3291:/*0306/';'6<8&/;13'/,6%5&,''*+3--/+*4&%& @SRR015242.3 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_217_F3 length=50 T30000002320022232001023330002000220231323302003320 +SRR015242.3 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_217_F3 length=50 !,%&'''---+5%%%*-(-2%37''%-&%%+(3-&%*%%'%*&2''%3.%% @SRR015242.4 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_312_F3 length=50 T01301202310020020101002322020221212212112020001111 +SRR015242.4 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_312_F3 length=50 !52/601)1,&3:%5691*-':74),'%%%&%&+(*)&%)'&&,'&)*)*% @SRR015242.5 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_482_F3 length=50 T30202333031100120210330331030310222032111001231300 +SRR015242.5 CLARA_20071207_2_CelmonAmp7797_8bit_1000_115_482_F3 length=50 !;<<<;;<<<<<1<<1;<56/<<:9:1;31;/<;%%/89/'99<'08)<%0
Comment