technical or biological difference between the two dataset?
Hello,
I have two chip-seq samples for the same protein in embryonic stem (ES) cells and rationic acid induced cells. I have obtained around 800 peaks in ES cells and around 7500 peaks in induced cells. Protocol, antibody, peak calling paramteres (MACS) and the person who has done the the experiments are all same. Number of reads obtained in both the samples is similar with similar level of background. If I see peaks in my new dataset, it has good enrichment as compared to the old one at the same region (~50% higher enrichment). I want to know, is this the real biological difference or because of deep sequencing, in the new data set I see good enrichment of tags which is not seen in the old dataset. How to rule out any technical problems, if there are any? Any suggestions are most welcome. Thanks
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Pravara_@bioinformatics View Posti know chip-seq data always present in following format
chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr10 63922216 63922240 U0 0 -
chr2 71081880 71081904 U0 0 +
I used SISSRS for such files (bed files)
I'm assuming you're getting these datasets from GEO. If so, the formats of the files are normally described there. Otherwise, #1-3 I'm not familiar with. #4 is a BED format file, you could use this in SISSRS like above. #5 is a BAM format file, that can be directly used in things like MACS and can also be converted to BED using bamtools if whatever program you prefer can't use BAM format. #6 looks like a modified BED format, it's actually close to the format I usually keep things in. I imagine you can put a "chr" in front of the number in the first column and add two columns of periods between columns 3 and 4 to make it a usable BED file. #7 and #8 look like the output of a peak finder. #9 is probably also the output of a peak finder, since the regions are quite broad and there's no strand information. #10 is another BED file. Presumably it was intended for visualization in the genome browser since someone bothered to fill in the itemRgb field.
BTW, it's probably best to only compare results within a single peak caller. Otherwise, differences in peaks you see between datasets may be due solely to the different algorithms behind the peak callers. Also, it can sometimes be easier to just realign things yourself and thereby produce a BED or BAM format file, since that's pretty quick.
Leave a comment:
-
Dear sir
i am working with chip-seq data.sir i have tried with SISSRS,QuEST,MACS,SICER.
Sir my problme is like ,i am not able to recognize files...like there are several file formats with me..all are chip-seq data...but i don't know whether this all files can i used with all softwares what i mentioned above ..sir please let me know what kind of data is this???
i know chip-seq data always present in following format
chr4 130135336 130135360 U0 0 -
chr1 110547319 110547343 U0 0 -
chr10 63922216 63922240 U0 0 -
chr2 71081880 71081904 U0 0 +
I used SISSRS for such files (bed files)
now there are other formats also like
1 E2H2.aligned.txt
chr13 81419432 81419468 + 205E9.6.559265 2
chr11 44462781 44462817 + 205E9.6.559267 0
chr1 89426606 89426642 - 205E9.6.559270 3
chr12 103518323 103518359 - 205E9.6.559271 0
chrX 128953935 128953971 - 205E9.6.559272 2
chr19 4888146 4888182 - 205E9.6.559274 5
chr4 137770387 137770423 + 205E9.6.559275 1
2.densities.txt
chr1 25 -1
chr1 50 -1
chr1 75 -1
chr1 100 -1
chr1 125 -1
chr1 150 -1
chr1 175 -1
chr1 200 -1
3.chip3034_multi_hg18.txt
AGAGTGTTTCAAACCTGCTCCATGAA 13000 13
AGACGAAGTCTCACTCTGTCACCCAG 13000 164
ATTCCATTCCACTCTGTTCCATTCCA 11953 24
AGTAACCCTTATTCTACTTAATAATG 13000 2
ATGGTAGTTCACACCTATAATCCCCG 11953 11
ATTGGCCAGATGCAGAGGCTCACACC 11953 9
ATAGCACAAAGGCAATAACACTTAAT 10906 3
i used this file format for QuEST
4.bed file
chr1 454 489 CCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCC 0 + - - 0,0,255
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGCTCTC 0 - - - 255,0,0
chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
5.bam files(these files are not opening in my system)
6.bed files .
6 38662156 38662189 +
8 102050882 102050916 +
16 16805607 16805640 -
10 18950674 18950708 -
4 52586623 52586657 -
8 126508725 126508748 -
5 83713731 83713758 +
1 217224630 217224664 -
2 234129500 234129531 -
5 116295091 116295124 -
17 36024302 36024336 -
7..bed files
chr1 564621 564687 . 0 . 5.575970 3.58854 -1
chr1 569893 569962 . 0 . 7.441230 6.19321 -1
chr1 712868 713455 . 0 . 11.857200 11.4429 -1
chr1 713653 713670 . 0 . 7.278470 4.21542 -1
chr1 713880 714756 . 0 . 87.115402 246.909 -1
chr1 715081 715443 . 0 . 18.861601 21.5467 -1
chr1 761030 763152 . 0 . 99.675797 201.571 -1
8.peaks.txt
chr1 6216808 6219103 985 186 5.29979577395856 799 1.34744732317805e-129
chr6 158010381 158011325 686 65 10.5893955160332 621 1.43057401891788e-129
chr5 33110401 33111074 644 51 12.7903624851984 593 1.50406065933793e-129
chr3 197589215 197590103 652 54 12.2534188623185 598 3.17417576226315e-129
chr3 150539977 150541729 852 129 6.62571157437829 723 3.84605198529492e-129
9.bed file
chr1 5319 6069
chr1 15612 16329
chr1 81077 82406
chr1 227508 228733
chr1 456299 456770
chr1 477582 478232
chr1 501635 501985
chr1 584463 586213
10.bed file
chr14 68535052 68535087 Neg2 1 - 68535052 68535087 153,255,153
chr10 72774109 72774144 Neg3 1 - 72774109 72774144 153,255,153
chr6 163049829 163049864 Pos4 14 + 163049829 163049864 0,0,102
chr7 144599649 144599684 Neg5 1 - 144599649 144599684 153,255,153
chr9 106823345 106823380 Pos6 1 + 106823345 106823380 153,153,255
Leave a comment:
-
Hi, dpryan
It really do make sense. I tried to fudge those data and found it do have no effect on the called peaks.
Thx very much.
Leave a comment:
-
You could largely convert a BED format file to ELAND format. BED format files don't usually contain anything about mismatches to the reference sequence, so you'd have to fudge that. Also, you'd have to look up the sequence for each read, though that's trivial. Frankly, those are the biggest differences in the formats and I doubt that any of the peak finders actually care about those fields. So, in short, yeah, you could probably convert the file type enough to work with a one line command using awk.
Leave a comment:
-
problem on file format in ChIP-Seq data analysis
Hi all,
I am wondering a problem about the file format in ChIP-Seq data analysis.
While I only have aligned data in BED format, what should be done if I want to run the data by a software which could not recognize the BED format such as PeakSeq or QuEST? Is there any way to convert the BED file to ELAND or likeness format file?
Thanks a lot.
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:45 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 07:45 AM
|
||
Started by seqadmin, Yesterday, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
175 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
Leave a comment: