Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sikidiri
    replied
    technical or biological difference between the two dataset?

    Hello,

    I have two chip-seq samples for the same protein in embryonic stem (ES) cells and rationic acid induced cells. I have obtained around 800 peaks in ES cells and around 7500 peaks in induced cells. Protocol, antibody, peak calling paramteres (MACS) and the person who has done the the experiments are all same. Number of reads obtained in both the samples is similar with similar level of background. If I see peaks in my new dataset, it has good enrichment as compared to the old one at the same region (~50% higher enrichment). I want to know, is this the real biological difference or because of deep sequencing, in the new data set I see good enrichment of tags which is not seen in the old dataset. How to rule out any technical problems, if there are any? Any suggestions are most welcome. Thanks

    Leave a comment:


  • dpryan
    replied
    Originally posted by Pravara_@bioinformatics View Post
    i know chip-seq data always present in following format

    chr4 130135336 130135360 U0 0 -
    chr1 110547319 110547343 U0 0 -
    chr10 63922216 63922240 U0 0 -
    chr2 71081880 71081904 U0 0 +

    I used SISSRS for such files (bed files)
    As you're finding out, there are a LOT of different file formats. Most of these are interchangeable. BED format can have anywhere between 3 and 12 columns. You tend to find data with the first 6 columns, but if you find pre-aligned paired-end sequences, they may have only the first 3 (required) columns. Also, this is all pre-aligned data as raw data will tend to be in fastq format.

    I'm assuming you're getting these datasets from GEO. If so, the formats of the files are normally described there. Otherwise, #1-3 I'm not familiar with. #4 is a BED format file, you could use this in SISSRS like above. #5 is a BAM format file, that can be directly used in things like MACS and can also be converted to BED using bamtools if whatever program you prefer can't use BAM format. #6 looks like a modified BED format, it's actually close to the format I usually keep things in. I imagine you can put a "chr" in front of the number in the first column and add two columns of periods between columns 3 and 4 to make it a usable BED file. #7 and #8 look like the output of a peak finder. #9 is probably also the output of a peak finder, since the regions are quite broad and there's no strand information. #10 is another BED file. Presumably it was intended for visualization in the genome browser since someone bothered to fill in the itemRgb field.

    BTW, it's probably best to only compare results within a single peak caller. Otherwise, differences in peaks you see between datasets may be due solely to the different algorithms behind the peak callers. Also, it can sometimes be easier to just realign things yourself and thereby produce a BED or BAM format file, since that's pretty quick.

    Leave a comment:


  • Pravara_@bioinformatics
    replied
    Dear sir

    i am working with chip-seq data.sir i have tried with SISSRS,QuEST,MACS,SICER.

    Sir my problme is like ,i am not able to recognize files...like there are several file formats with me..all are chip-seq data...but i don't know whether this all files can i used with all softwares what i mentioned above ..sir please let me know what kind of data is this???

    i know chip-seq data always present in following format

    chr4 130135336 130135360 U0 0 -
    chr1 110547319 110547343 U0 0 -
    chr10 63922216 63922240 U0 0 -
    chr2 71081880 71081904 U0 0 +

    I used SISSRS for such files (bed files)


    now there are other formats also like

    1 E2H2.aligned.txt




    chr13 81419432 81419468 + 205E9.6.559265 2
    chr11 44462781 44462817 + 205E9.6.559267 0
    chr1 89426606 89426642 - 205E9.6.559270 3
    chr12 103518323 103518359 - 205E9.6.559271 0
    chrX 128953935 128953971 - 205E9.6.559272 2
    chr19 4888146 4888182 - 205E9.6.559274 5
    chr4 137770387 137770423 + 205E9.6.559275 1

    2.densities.txt

    chr1 25 -1
    chr1 50 -1
    chr1 75 -1
    chr1 100 -1
    chr1 125 -1
    chr1 150 -1
    chr1 175 -1
    chr1 200 -1

    3.chip3034_multi_hg18.txt

    AGAGTGTTTCAAACCTGCTCCATGAA 13000 13
    AGACGAAGTCTCACTCTGTCACCCAG 13000 164
    ATTCCATTCCACTCTGTTCCATTCCA 11953 24
    AGTAACCCTTATTCTACTTAATAATG 13000 2
    ATGGTAGTTCACACCTATAATCCCCG 11953 11
    ATTGGCCAGATGCAGAGGCTCACACC 11953 9
    ATAGCACAAAGGCAATAACACTTAAT 10906 3

    i used this file format for QuEST

    4.bed file

    chr1 454 489 CCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCC 0 + - - 0,0,255
    chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0
    chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGCTCTC 0 - - - 255,0,0
    chr1 512 547 TTTCGGTGGTACTCTGAAGGCGGAGCACAGTTCTC 0 - - - 255,0,0


    5.bam files(these files are not opening in my system)

    6.bed files .



    6 38662156 38662189 +
    8 102050882 102050916 +
    16 16805607 16805640 -
    10 18950674 18950708 -
    4 52586623 52586657 -
    8 126508725 126508748 -
    5 83713731 83713758 +
    1 217224630 217224664 -
    2 234129500 234129531 -
    5 116295091 116295124 -
    17 36024302 36024336 -



    7..bed files

    chr1 564621 564687 . 0 . 5.575970 3.58854 -1
    chr1 569893 569962 . 0 . 7.441230 6.19321 -1
    chr1 712868 713455 . 0 . 11.857200 11.4429 -1
    chr1 713653 713670 . 0 . 7.278470 4.21542 -1
    chr1 713880 714756 . 0 . 87.115402 246.909 -1
    chr1 715081 715443 . 0 . 18.861601 21.5467 -1
    chr1 761030 763152 . 0 . 99.675797 201.571 -1


    8.peaks.txt

    chr1 6216808 6219103 985 186 5.29979577395856 799 1.34744732317805e-129
    chr6 158010381 158011325 686 65 10.5893955160332 621 1.43057401891788e-129
    chr5 33110401 33111074 644 51 12.7903624851984 593 1.50406065933793e-129
    chr3 197589215 197590103 652 54 12.2534188623185 598 3.17417576226315e-129
    chr3 150539977 150541729 852 129 6.62571157437829 723 3.84605198529492e-129

    9.bed file

    chr1 5319 6069
    chr1 15612 16329
    chr1 81077 82406
    chr1 227508 228733
    chr1 456299 456770
    chr1 477582 478232
    chr1 501635 501985
    chr1 584463 586213

    10.bed file


    chr14 68535052 68535087 Neg2 1 - 68535052 68535087 153,255,153
    chr10 72774109 72774144 Neg3 1 - 72774109 72774144 153,255,153
    chr6 163049829 163049864 Pos4 14 + 163049829 163049864 0,0,102
    chr7 144599649 144599684 Neg5 1 - 144599649 144599684 153,255,153
    chr9 106823345 106823380 Pos6 1 + 106823345 106823380 153,153,255

    Leave a comment:


  • sp_wade
    replied
    Hi, dpryan
    It really do make sense. I tried to fudge those data and found it do have no effect on the called peaks.
    Thx very much.

    Leave a comment:


  • dpryan
    replied
    You could largely convert a BED format file to ELAND format. BED format files don't usually contain anything about mismatches to the reference sequence, so you'd have to fudge that. Also, you'd have to look up the sequence for each read, though that's trivial. Frankly, those are the biggest differences in the formats and I doubt that any of the peak finders actually care about those fields. So, in short, yeah, you could probably convert the file type enough to work with a one line command using awk.

    Leave a comment:


  • sp_wade
    started a topic problem on file format in ChIP-Seq data analysis

    problem on file format in ChIP-Seq data analysis

    Hi all,
    I am wondering a problem about the file format in ChIP-Seq data analysis.
    While I only have aligned data in BED format, what should be done if I want to run the data by a software which could not recognize the BED format such as PeakSeq or QuEST? Is there any way to convert the BED file to ELAND or likeness format file?
    Thanks a lot.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:45 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:59 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
175 views
0 likes
Last Post seqadmin  
Working...
X