Seqanswers Leaderboard Ad

**Xian** · 09-20-2010, 11:22 AM

Hi, please tell us more about your data. Is it a paired end whole genome sequence data? Does it have the information like insert size, library, readgroup?

**zhang1000** · 09-21-2010, 06:36 AM

Hi Xian

I tried several times to submit my reply, but it always failed. Could you give me your email or somrthing, I can give more details. Thank you!

**zhang1000** · 09-21-2010, 06:40 AM

HI Xian:

Our data is paired end data for whole genome. The raw data has been converted into a bam file. The information of the bam file just as followed:

The header is like:

@HD VN:1.0
@SQ SN:chr1 LN:249250621 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr2 LN:243199373 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr3 LN:198022430 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr4 LN:191154276 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr5 LN:180915260 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr6 LN:171115067 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr7 LN:159138663 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr8 LN:146364022 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr9 LN:141213431 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr10 LN:135534747 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr11 LN:135006516 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr12 LN:133851895 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr13 LN:115169878 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr14 LN:107349540 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr15 LN:102531392 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr16 LN:90354753 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr17 LN:81195210 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr18 LN:78077248 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr19 LN:59128983 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr20 LN:63025520 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr21 LN:48129895 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chr22 LN:51304566 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chrX LN:155270560 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chrY LN:59373566 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@SQ SN:chrM LN:16571 AS:GS000000688-ASM UR:/home/cgi/src/hg19/hg19.crr
@RG ID:GS13694-FS3-L07 SM:GS00275-DNA_G05 LB:GS00648-CLS_D10 PU:GS13694-FS3-L07 CN:"Complete Genomics" DT:2010-07-24 PL:"Complete Genomics"
@PG ID:cgatools VN:1.0.0 CL:"cgatools" "map2sam" "--reads=reads_GS13694-FS3-L07_002.tsv.bz2" "--mappings=mapping_GS13694-FS3-L07_002.tsv.bz2" "--library=/mnt/wd/GS000000776-DID/GS000000688-ASM/GS00275-DNA_G05/LIB/GS00648-CLS_D10/lib_DNB_GS00648-CLS_D10.tsv" "--mate-sv-candidates" "-s" "/home/cgi/src/hg19/hg19.crr"

The content of the bam is like:

GS13694-FS3-L07-2:11807267 345 chr1 14785 0 10M5N23M * 0 0 CTCCTCCGGGCACCAACCCCAGGTCCTTTCCCA /0:,6:584(-(7;(&:7;0667767758*/7( GC:Z:28S2G3S GS:Z:TCTC GQ:Z:4%8*
GS13694-FS3-L07-2:18823688 371 chr1 14786 0 10M5N23M = 14318 -468 TCCTCCGGGCACCAGCCCCAGGTCCTTTCCCAG .13:999919'6*&:;;8;87777774678777 GC:Z:28S2G3S GS:Z:CCCC GQ:Z:7858
GS13694-FS3-L07-2:9361415 435 chr1 14790 0 23M6N10M = 15136 346 CCGGGCCCCTCACCAGCCCCAGGCCCAGAGATG 97570837878889:

;;:574*-+3888586 GC:Z:3S2G28S GS:Z:GGGG GQ:Z:7+70
GS13694-FS3-L07-2:15395247 409 chr1 14790 0 23M6N10M * 0 0 CCGGGCCCCTCACCAGCCCCAGGCGCAGAGATG 8*984356969975;:4;;;93(%&(+$3788, GC:Z:3S2G28S GS:Z:GAGG GQ:Z:8&)4
GS13694-FS3-L07-2:23014034 435 chr1 14794 1 23M5N10M = 15142 348 GCCCCTCACCAGCCCCAGGTCCTAGAGAGGCCT 9998

32888888;;7;::;:9'((()%*#*5 GC:Z:3S2G28S GS:Z:CCCC GQ:Z:8:34

Please help me to find wht is wrong with it. Thank you so much!

**ramouz87** · 09-21-2010, 06:59 AM

Hi,
we could say that Complete Genomics Data is "paired-end " with 400-600 insert size but it's diffrent from illumina data in the sens that each read is composed of 4 DNA fragments (5/10/10/10) having small gaps in between.
more details about data format is available here
I guess a preprocessing is required to reconstruct the read from the 4 fragments first to have the usual format and then provide that to Breakdancer.
I thing I'll have to work with such data in the near futur so I'll update if i have a concrete idea or a script to deal with such data.
Best,
Ramzi

**zhang1000** · 09-21-2010, 08:17 AM

Hi Ramzi:

I think you are quite right about the data. But in this bam file, there is a sequence that I think it is the reconstructed sequence by the 4 fragments. So how do you think about that? Do you think we need to reconstruct it again using fragments? Do you have any idea about that? Thank you for your time!

Jigang

**Jon_Keats** · 09-21-2010, 12:13 PM

Looking for help on this application myself I've come across a number of people saying they get empty config files. Have you tested the script on a standard illumina PE file or anythings to see if it works at all? Obviously the CGI data to bam conversion might be an issue but I'd check that the script works first. Perhaps you have done this already if so I apologize for the lame comment.

**zhang1000** · 09-21-2010, 12:40 PM

Hi Jon Keats:

Thank you for your suggestioin. I did not test the script using illumina PE file. But I knew some message on this forum, and they said they did it successfully to creat config files. Did you also work on CGI data? Do you have any clue for this problem? By the way, I send a sample data the author who developed this software for help. He did not give me a solution until now. Maybe he is trying to figure out how to deal with CGI data. So keep in touch about this problem. Thank you!

**Jon_Keats** · 09-21-2010, 01:27 PM

Hi Jigang,

If you PM your email I'll email you a chopped down (6Mb) bam file that does work so you can test the script on your end at least.

**zhang1000** · 09-21-2010, 05:12 PM

Hi Jon Keats:

Thank you so much! My email is [email protected].

Best wishes

Jigang

**ramouz87** · 09-22-2010, 12:08 AM

Hi,
I've tested Breakdancer on Illumina data (Mate paire / paired end ) and I had no problem generating the configuration file, you have to be carefull to use sorted bam.
is that the case ?
Best,
Ramzi

**zhang1000** · 09-22-2010, 05:56 AM

Hi Ramzi:

Thank you! But I already sort the bam file. So Maybe there is another problem with my data!

Best wishes

Jigang

**ramouz87** · 09-22-2010, 06:38 AM

Hi Jigang,
Most likely it's related to read structure, did you try to contact people at CGI to ask about the best way to tackle structural variation analysis using their data ?
Have you tried to visualize the bam file using Tablet, did you have paired-read showing ?
Best,
Ramzi

**zhang1000** · 09-22-2010, 08:01 AM

Hi Ramzi:

CGI is developing the methods of structural variation analysis. They said we can try to use the bam files to do that analysis. As you mentioned, how can I visualize the data? and if it can used by breakdancer, what it should be like? I have no idea about that. Could you give me some advice?

Thank you!

Jigang

**Xian** · 09-24-2010, 07:01 AM

Hi Jigang,

To visualize the data, you can use samtools view in http://samtools.sourceforge.net/samtools.shtml. If it can be used by breakdancer bam2cfg.pl, the results of bam2cfg.pl should be like:

readgroup:* platform:illumina map:*.bam readlen:75.00 lib:* num:10001 lower:86.83 upper:443.91 mean:315.09 std:43.92 exe:samtools view

Please visit the website http://breakdancer.sourceforge.net/index.html for more details.

-Xian

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Need help in Breakdancer for indentifying structural variation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News