Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • poisson200
    replied
    Hi flxlex,
    Thanks for the quick reply and the answers.

    Originally posted by flxlex
    Taking all contigs and isotigs into a CD-HIT run might collapse paralogues, though...
    Looking at CD-hit, by default it looks for 98% identity or greater, which I think should be stringent enough not to collapse any paralogs (paralogs would have to be from a very recent gene duplication event or from a CNV for that to happen) but it is a good point to bear in mind.

    To correct; cdhit-est, for me, should be set to 0.98, which is 0.9 by default.

    Thanks again,

    John.
    Last edited by poisson200; 10-28-2010, 05:36 AM.

    Leave a comment:


  • flxlex
    replied
    Originally posted by poisson200 View Post
    To clarify; would combining the Isotig.fna and the contigs.fna files into a single file and then running CD-hit give you a comprehensive, non-redundant set of transcripts from your 454 transcriptome for further analyses?
    Hmm, that could actually work, hadn't thought of that. I always thought of running CD-HIT per isogroup with some looping script. Taking all contigs and isotigs into a CD-HIT run might collapse paralogues, though...

    Are there are single reads anywhere else that are neither contigs nor isotigs but are still useful?
    Yep, but so far, newbler does not output them in a separate file. You can get the IDs of the singleton reads from the 454ReadStatus file. Further, check this post:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Leave a comment:


  • poisson200
    replied
    mmmmm, Ponder

    Hello.
    I also want to make sure every possibly sequence is used in my further data analyses;

    Originally posted by flxlex
    "Isotigs are transcripts, build out of the contigs."
    Originally posted by cram
    "Unfortunately, the only way to make sure your further analyses are using all your data is to take the 454Isotigs.fna plus the larger contigs from those isogroups where proper isotig formation failed.
    Originally posted by flxlex
    CD-hit would help
    Thanks flxlex, that program is a real help.

    To clarify; would combining the Isotig.fna and the contigs.fna files into a single file and then running CD-hit give you a comprehensive, non-redundant set of transcripts from your 454 transcriptome for further analyses?

    Are there are single reads anywhere else that are neither contigs nor isotigs but are still useful?

    Thank you for any advice,

    John.

    Leave a comment:


  • westerman
    replied
    Originally posted by jordi View Post
    Hi all!
    Which could be the reason for this discrepancy?
    I suspect that some of the reads are being split among the contigs. Such reads would be counted twice.

    Leave a comment:


  • jordi
    replied
    Hi all!
    I did a Newbler transcriptome assembly a year ago and it was very difficult to find some information about the process outcome (flxlex , thank you very much for your blog!). About this, I tried to know how many reads assembled, and I got different results depending the file I saw. For instance, according to 454AllContigs.fna 12310 reads were assembled in a sample identified by a MID tag (multiplexed) (I added all reads from the last column, numreads=), but I got such information in the 454NewblerMetrics.txt file:
    numberAssembled = 6603;
    numberPartial = 5359;
    numberSingleton = 8674;
    numberRepeat = 1101;
    numberOutlier = 723;
    Total reads = 22460
    Which could be the reason for this discrepancy?
    I did the assembly with the release 1.1.03.24 of Newbler.
    Regards,

    Leave a comment:


  • flxlex
    replied
    Originally posted by CHRYSES View Post
    I guess you meant: Different "isotigs" within the same isogroup represent (...)
    Yep. Thanks...

    Leave a comment:


  • CHRYSES
    replied
    Originally posted by flxlex View Post
    Different isogroups within the same isogroup represent alternative splice variants.
    I guess you meant: Different "isotigs" within the same isogroup represent (...)

    Leave a comment:


  • flxlex
    replied
    Originally posted by litali View Post
    3. In the file " 454 graph" there is the scaffold section, however, we had non-paired end sequencing, so what is the basis for this scaffold?
    Scaffolding is not really scaffolding here, just a description of the relation between the contigs and the isotigs. The same description is given in different ways in the 454IsotigsLayout.txt and 454Isotigs.txt files

    Leave a comment:


  • cram
    replied
    1. In the file 454AllContigs, there are some "contigs" with one or a few nucleotides.
    What are those "contigs"?
    These very small contigs seem to be produced when Newbler has difficulty resolving the edges of real contigs. We often see these in very highly abundant transcripts, presumably because the number of sequencing errors is high enough to make Newbler think these are real variations. So if the edge of an exon look like:


    ...CATGCATGAAA
    ...CATGCATGAAA
    ...CATGCATGAAA
    ...CATGCATGAAAA
    ...CATGCATGAAAA


    Newbler might consider that fourth 'A' in the last two reads to be a separate exon/contig.


    2. some isogroups include only contigs and not isotigs (the first 2 groups in our case), the short "contigs" from the previous question are also assigned to this isogroup. So what is this isogroup? it is all the same gene? different genes? why there are no isotigs?
    The isotigs are computed by traversing the contig graph, and Newbler has limits to how deep it will recurse when doing this. So if you have a bunch of these false contigs, it will eventually give up on trying to produce isotigs. You can try increasing the default limts, but in my experience even the max allowed values are not always sufficient.

    Which of the files are recommended for further analysis, such as blast? The 454Isotigs.fna ? The 454AllContigs.fna (and then how all the very short sequences should be treated?)
    Unfortunately, the only way to make sure your further analyses are using all your data is to take the 454Isotigs.fna plus the larger contigs from those isogroups where proper isotig formation failed.

    Leave a comment:


  • litali
    replied
    more about cDNA

    Thanks alot, I have read your blog which explains in a very good way. Still, some questions are left:
    1. In the file 454AllContigs, there are some "contigs" with one or a few nucleotides.
    What are those "contigs"?
    2. some isogroups include only contigs and not isotigs (the first 2 groups in our case), the short "contigs" from the previous question are also assigned to this isogroup. So what is this isogroup? it is all the same gene? different genes? why there are no isotigs?
    3. In the file " 454 graph" there is the scaffold section, however, we had non-paired end sequencing, so what is the basis for this scaffold?
    4. Which of the files are recommended for further analysis, such as blast? The 454Isotigs.fna ? The 454AllContigs.fna (and then how all the very short sequences should be treated?)

    Leave a comment:


  • flxlex
    replied
    Isotigs are transcripts, build out of the contigs. Different isogroups within the same isogroup represent alternative splice variants. This makes the isogroup the equivalent of a gene.

    Take this with a grain of salt, though, it is based on mining the contig graph for subgraphs (isogroups) and traversing all possible subgraphs (isotigs). We find, for example, small variations (SNPs, indels) generating almost identical isotigs. So, perhaps cluster the isotigs using CD-hit would help.

    Visualizing the graph is a wish we all have.

    Leave a comment:


  • litali
    started a topic cDNA analysis 454 assembler

    cDNA analysis 454 assembler

    Hello,
    Could anybody explain from his experience the output files from 454 cDNA assembly? ( Isotigs, contigs, graph etc.) . For example, which file to use for further analysis- the 454AllIsotigs or the AllContigs and what exactly is the difference? how to visualize the graph? It is impossible to understand something from the graph.txt output file etc. THANKS ALOT!!!!!

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:20 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
36 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
40 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Working...
X