Seqanswers Leaderboard Ad

**dschika** · 01-07-2011, 07:41 AM

Hey FelipeAd,

Can you perhaps explain what you have done in more detail? How have you compared the contigs? Do you use -cdna option in the assembly?

**FelipeAd** · 01-10-2011, 12:19 AM

Thank you for your reply.

Well, I analyzed the 4 libraries separately with Newbler but the sequence of the resulting contigs had no similarity.

**pmiguel** · 01-10-2011, 04:08 AM

No similarity using what? Sequence-level using some program like BLAST? Or do you mean something else.

How many contigs do you have and what total amount of contig bases?

Two libraries from the same organism should have some sequence similarity.

--
Phillip

**FelipeAd** · 01-12-2011, 06:55 AM

I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).

**pmiguel** · 01-12-2011, 07:27 AM

Hi FelipeAd,
We are trying to help you out, but you are not answering all the questions we ask. Would you please re-read my previous post and answer all the questions?

--
Phillip

**FelipeAd** · 01-12-2011, 08:05 AM

Well, I have a small number of contigs. The total amount of (large, meaning >100 bp length) contigs were 120 and 140 for each library, with a total number of bases in these contigs 78,000 and 100,000 respectively.

A large number of bases, though, seems to be unassembled (?), at least in large contigs since i start with almost 20,000,000 bases for each library...

**pmiguel** · 01-12-2011, 08:36 AM

My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

--
Phillip

**westerman** · 01-12-2011, 08:36 AM

Originally posted by FelipeAd View Post

I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).

If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

Perhaps if you can write out the specific steps and commands you are using then we can help.

As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

**pmiguel** · 01-12-2011, 08:47 AM

I took his post to mean that most of the reads had not, in fact, assembled. If this is the case, then you take the (say) 1% most highly expressed transcripts from a tissue that is stressed and one that is not and blast them against each other it is not completely impossible that you would not get a hit. But it would be surprising.

--
Phillip

**flxlex** · 01-13-2011, 02:42 AM

What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...

**pmiguel** · 01-13-2011, 04:24 AM

Actually that is what we do at Purdue. I like it because you annotate the contigs and then have read counts and depths from each component sample. So you get DGE-like results with zero reference information.

Of course Newbler's read tearing behavior makes this questionable -- but you can force Newbler to assemble a read into only one contig.

For someone who can't script though, this would be a tough one. The assembly would be trivial if you use the GUI to gsAssembler. But then figuring out which read came from which sample afterward...

--
Phillip

**FelipeAd** · 01-13-2011, 07:12 AM

Originally posted by pmiguel View Post

My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

--
Phillip

Thank you for your reply. My initial approach (using blastx) was to apply a Fisher's exact test between the two libraries, say stress-unstressed. In this test i get to significant maches. I am running now the blastn to see if i can get some similarity.

I am very concerned, though, about the non-similarity even with the blastx, since both of the libraries were blasted against known databases with several GO hits.

**FelipeAd** · 01-13-2011, 07:16 AM

Originally posted by flxlex View Post

What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...

This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
I suppose it is better also to have each contig in a different .ace file, is that right?

**FelipeAd** · 01-13-2011, 07:26 AM

Originally posted by westerman View Post

If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

Perhaps if you can write out the specific steps and commands you are using then we can help.

As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

Maybe this gives highly specific contigs (???) for each library?
I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Compare contigs between libraries (Newbler)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News