Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help in Breakdancer for indentifying structural variation

    Hi all:

    I am using the breakdancer for indentifying structural variation. First step I did is to create a configuration file using bam2cfg.pl. But the cgf file is empty, and no error message come out. My data is from complete genomics ,and I already converted the raw data into bam file using samtools. Did someone meet this situation, and could you please tell me how to sovle this problem!

    The command I used is following:

    bam2cfg.pl -g -h result.bam > BRC6.cfg

    Thank you!

  • #2
    Hi, please tell us more about your data. Is it a paired end whole genome sequence data? Does it have the information like insert size, library, readgroup?

    Comment


    • #3
      Hi Xian

      I tried several times to submit my reply, but it always failed. Could you give me your email or somrthing, I can give more details. Thank you!

      Comment


      • #4
        HI Xian:

        Our data is paired end data for whole genome. The raw data has been converted into a bam file. The information of the bam file just as followed:

        The header is like:

        @HD VN:1.0
        @SQ SN:chr1 LN:249250621 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr2 LN:243199373 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr3 LN:198022430 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr4 LN:191154276 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr5 LN:180915260 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr6 LN:171115067 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr7 LN:159138663 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr8 LN:146364022 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr9 LN:141213431 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr10 LN:135534747 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr11 LN:135006516 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr12 LN:133851895 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr13 LN:115169878 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr14 LN:107349540 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr15 LN:102531392 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr16 LN:90354753 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr17 LN:81195210 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr18 LN:78077248 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr19 LN:59128983 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr20 LN:63025520 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr21 LN:48129895 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chr22 LN:51304566 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chrX LN:155270560 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chrY LN:59373566 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @SQ SN:chrM LN:16571 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
        @RG ID:GS13694-FS3-L07 SM:GS00275-DNA_G05 LB:GS00648-CLS_D10 PU:GS13694-FS3-L07 CN:"Complete Genomics" DT:2010-07-24 PL:"Complete Genomics"
        @PG ID:cgatools VN:1.0.0 CL:"cgatools" "map2sam" "--reads=reads_GS13694-FS3-L07_002.tsv.bz2" "--mappings=mapping_GS13694-FS3-L07_002.tsv.bz2" "--library=/mnt/wd/GS000000776-DID/GS000000688-ASM/GS00275-DNA_G05/LIB/GS00648-CLS_D10/lib_DNB_GS00648-CLS_D10.tsv" "--mate-sv-candidates" "-s" "/home/cgi/src/hg19/hg19.crr"

        The content of the bam is like:

        GS13694-FS3-L07-2:11807267 345 chr1 14785 0 10M5N23M * 0 0 CTCCTCCGGGCACCAACCCCAGGTCCTTTCCCA /0:,6:584(-(7;(&:7;0667767758*/7( GC:Z:28S2G3S GS:Z:TCTC GQ:Z:4%8*
        GS13694-FS3-L07-2:18823688 371 chr1 14786 0 10M5N23M = 14318 -468 TCCTCCGGGCACCAGCCCCAGGTCCTTTCCCAG .13:999919'6*&:;;8;87777774678777 GC:Z:28S2G3S GS:Z:CCCC GQ:Z:7858
        GS13694-FS3-L07-2:9361415 435 chr1 14790 0 23M6N10M = 15136 346 CCGGGCCCCTCACCAGCCCCAGGCCCAGAGATG 97570837878889:;;:574*-+3888586 GC:Z:3S2G28S GS:Z:GGGG GQ:Z:7+70
        GS13694-FS3-L07-2:15395247 409 chr1 14790 0 23M6N10M * 0 0 CCGGGCCCCTCACCAGCCCCAGGCGCAGAGATG 8*984356969975;:4;;;93(%&(+$3788, GC:Z:3S2G28S GS:Z:GAGG GQ:Z:8&)4
        GS13694-FS3-L07-2:23014034 435 chr1 14794 1 23M5N10M = 15142 348 GCCCCTCACCAGCCCCAGGTCCTAGAGAGGCCT 999832888888;;7;::;:9'((()%*#*5 GC:Z:3S2G28S GS:Z:CCCC GQ:Z:8:34

        Please help me to find wht is wrong with it. Thank you so much!

        Comment


        • #5
          Hi,
          we could say that Complete Genomics Data is "paired-end " with 400-600 insert size but it's diffrent from illumina data in the sens that each read is composed of 4 DNA fragments (5/10/10/10) having small gaps in between.
          more details about data format is available here
          I guess a preprocessing is required to reconstruct the read from the 4 fragments first to have the usual format and then provide that to Breakdancer.
          I thing I'll have to work with such data in the near futur so I'll update if i have a concrete idea or a script to deal with such data.
          Best,
          Ramzi
          Research Scientist - Bioinformatics
          Sidra Medical and Research Center

          Comment


          • #6
            Hi Ramzi:

            I think you are quite right about the data. But in this bam file, there is a sequence that I think it is the reconstructed sequence by the 4 fragments. So how do you think about that? Do you think we need to reconstruct it again using fragments? Do you have any idea about that? Thank you for your time!

            Jigang

            Comment


            • #7
              Looking for help on this application myself I've come across a number of people saying they get empty config files. Have you tested the script on a standard illumina PE file or anythings to see if it works at all? Obviously the CGI data to bam conversion might be an issue but I'd check that the script works first. Perhaps you have done this already if so I apologize for the lame comment.
              Last edited by Jon_Keats; 09-21-2010, 12:14 PM. Reason: spelling

              Comment


              • #8
                Hi Jon Keats:

                Thank you for your suggestioin. I did not test the script using illumina PE file. But I knew some message on this forum, and they said they did it successfully to creat config files. Did you also work on CGI data? Do you have any clue for this problem? By the way, I send a sample data the author who developed this software for help. He did not give me a solution until now. Maybe he is trying to figure out how to deal with CGI data. So keep in touch about this problem. Thank you!

                Comment


                • #9
                  Hi Jigang,

                  If you PM your email I'll email you a chopped down (6Mb) bam file that does work so you can test the script on your end at least.

                  Comment


                  • #10
                    Hi Jon Keats:

                    Thank you so much! My email is [email protected].

                    Best wishes

                    Jigang

                    Comment


                    • #11
                      Hi,
                      I've tested Breakdancer on Illumina data (Mate paire / paired end ) and I had no problem generating the configuration file, you have to be carefull to use sorted bam.
                      is that the case ?
                      Best,
                      Ramzi
                      Research Scientist - Bioinformatics
                      Sidra Medical and Research Center

                      Comment


                      • #12
                        Hi Ramzi:

                        Thank you! But I already sort the bam file. So Maybe there is another problem with my data!

                        Best wishes

                        Jigang

                        Comment


                        • #13
                          Hi Jigang,
                          Most likely it's related to read structure, did you try to contact people at CGI to ask about the best way to tackle structural variation analysis using their data ?
                          Have you tried to visualize the bam file using Tablet, did you have paired-read showing ?
                          Best,
                          Ramzi
                          Last edited by ramouz87; 09-22-2010, 06:41 AM.
                          Research Scientist - Bioinformatics
                          Sidra Medical and Research Center

                          Comment


                          • #14
                            Hi Ramzi:

                            CGI is developing the methods of structural variation analysis. They said we can try to use the bam files to do that analysis. As you mentioned, how can I visualize the data? and if it can used by breakdancer, what it should be like? I have no idea about that. Could you give me some advice?

                            Thank you!

                            Jigang

                            Comment


                            • #15
                              Hi Jigang,

                              To visualize the data, you can use samtools view in http://samtools.sourceforge.net/samtools.shtml. If it can be used by breakdancer bam2cfg.pl, the results of bam2cfg.pl should be like:

                              readgroup:* platform:illumina map:*.bam readlen:75.00 lib:* num:10001 lower:86.83 upper:443.91 mean:315.09 std:43.92 exe:samtools view

                              Please visit the website http://breakdancer.sourceforge.net/index.html for more details.

                              -Xian

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X