Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zam
    replied
    It's fine. I have more questions - shall we take it offline? I'm zam AT well.ox.ac.uk

    Leave a comment:


  • rururara
    replied
    Definitely yes. Is there any concern about that? Do u mind to share? Anyway, I would like to try this approach whereby I assemble the parental reads with scaffold and use it as a reference sequence to align against the other two progeny. What do u think?

    Leave a comment:


  • Zam
    replied
    Hi there- when you say one of the samples is parental, does that mean you have two parents and 2 F1 samples, and you have sequenced one parent and both progeny?
    Zam

    Leave a comment:


  • rururara
    replied
    Hi Zam & fcr,

    Yup, we are not in the same team. Hehe. Papaya is diploid. I have 3 samples and one of the sample is parental lines. I'm not sure yet the depth coverage as I am still not getting any sequencing information from the company, but soon I will. Papaya is sequence using HiSeq platform.

    Leave a comment:


  • Zam
    replied
    Yes, and to explain that in more detail:
    Rururura:

    1.If you have one diploid sample you can de novo discover variants using Cortex, and then use your contigs/scaffolds to assign them coordinates. This is what Fernando meant by "CoordinatesOnly", an option for Cortex's new wrapper script.

    2. If you have several samples, then you can do two things
    a) You can also use the Cortex "population filter" to classify putative variants as repeat/error/polymorphism - this method is robust to reference assembly errors - it catching missing collapsed repeats in the reference - and this will give you a high quality set of variants
    b) you could use this method to look into the quality of the reference and annotate regions which you trust and do not trust.

    Zam

    Leave a comment:


  • fcr
    replied
    Hi Zam,

    Rururara is not working in the same project as me. If papaya is a diploid, he could probably use the papaya scaffolds with the "Coordinates Only" option during the calling with cortex_var (actually a acompanying script called runcalls.pl). Right?

    Cheers,
    Fernando

    Leave a comment:


  • Zam
    replied
    Hi Rururara
    Are you working on the same project as Fernando or a different one? If different, how many samples are you trying to discover SNPs in, and what are their depths od coverage and with what technology. Finally, sorry for ignorance, but what is the ploidy of papaya?
    regards
    Zam

    Leave a comment:


  • rururara
    replied
    De novo SNP calling in absence of complete reference assembly

    Hai all,

    What about if the incomplete reference genome like papaya? The available information on papaya are scaffolds and contigs. Is it possible to use papaya scaffolds as a reference to align against my reads? In my case, the objective is to discover the SNPs.

    Leave a comment:


  • fcr
    replied
    Hi,

    Yes, Zam got it right. I want to start calling SNPs now. The assembly is unfinished and it's going to take time polishing it (~1000,000 scaffolds now). In response to Zam, the assembly is based on an individual, and the estimated coverage is 60X.

    The other 10 individuals have 20 X coverage and i want to use them for SNP calling and perhaps "pilot" genotype calling. I think is worthy advance on this, even if in the future a second calling based on the assembly will help to verify/reject candidate regions of interest.

    lh3: Thanks for your comment about the reference bias when estimating the population statistics...I will keep that in mind.

    Cheers,
    Fernando

    Leave a comment:


  • Zam
    replied
    Just to clarify one thing (and agree with Heng) - my understanding is that Fernando doesnt want to have to wait until his assembly is finished (I mean done/completed, not finished by manual finishers), and wants to get on with it and start calling now. That's what got me nervous about artefacts.

    Leave a comment:


  • lh3
    replied
    With 60X, you should be able to get an assembly decent enough for most analyses. This is true for human. Nonetheless, Zam is right that misassembly may cause artifacts. You have to live with it. If you are careful enough, you can greatly reduce the effect of that. Also beware that there will be reference bias when estimating population statistics (i.e. individuals closer to the reference will be mapped better).

    Leave a comment:


  • Zam
    replied
    Hi Fernando


    >True, the distribution of coverage will include regions above 30x.
    One of the examples in our paper is of SNP calling in 10 samples each sampled to 6x,
    for example.

    2. Actually, you could call on 10 individuals with much less than 256Gb of RAM. You need 256Gb to hold all of ALL of their genomes at the same time. But lots of the genome is either monomorphic, or doesn't consist of things Cortex can call. So you could do those 10 samples in ~80Gb of RAM (for comparison I've just done 85 humans in 320 Gb of RAM).
    The trick is to call on the joint graph (1 colour, probably needs 80Gb RAM) and then pull out just the variants and make a graph just of the variants. Then "multicolourise" the graph and make a 10-colour graph of the variants only, and genotype everyone in that.
    Uses far less memory.

    How much coverage do your 10 samples have? Is the 60x individual a different sample?

    I'm not saying it is too risky with scaffolds, just that if you find something really exciting, you need to do some work making sure it's not an artefact. I've seen people have to work very hard to avoid problems with the chimp genome.

    best

    Zam

    Leave a comment:


  • fcr
    replied
    Hi Zam,

    Thanks a lot.

    Cortex:
    1. True, the distribution of coverage will include regions above 30x.
    2. What are the Computational needs for 10 individuals with 2.9 Gbp genome? In your server you stated "10 humans on a 256Gb RAM server" How long this takes? Would it be possible to call SNPs with less RAM?

    What to do:
    This is a 60 X coverage genome. I would assume that many of the scaffolds are bona fide and that many of the changes (adding more libraries) are going to affect mainly to the connection among scaffolds rather than disrupting them...but I might be wrong and shouldn't guess. The main interest are; 1. develop genome-wide set of markers and 2. do some population inferences by estimating Fst, Pi and Ne.

    So you think is too risky using scaffolds?


    Cheers,
    Fernando

    Leave a comment:


  • Zam
    replied
    Re Cortex:
    1. You have much more than 30x coverage if you have many samples at 20x
    2. It's not as simple as "you need 30x" for Cortex. But you are absolutely right that an assembly approach will be less sensitive to SNPs.


    Re what to do
    - it depends what you want to achieve. Do you want a conservative small set of SNPs for building a genetic map, or a big sensitive set for some other purpose etc.

    If you have the time, then try both methods (mapping/assembly) and compare. If you are doing population genetic studies, then experience suggests that you will need to be very careful with SNP calls based on an assembly that is not high quality, as it is easy for assembly artefacts to look like interesting scientific finds in your SNPs.

    Leave a comment:


  • lh3
    replied
    I think you should map your reads to the assembly and then do SNP calling. SAMtools should in principle work, but I have not tried.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-20-2025, 05:03 AM
0 responses
17 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-19-2025, 07:27 AM
0 responses
18 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
19 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
185 views
0 reactions
Last Post seqadmin  
Working...