Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cement_head
    replied
    Originally posted by howen View Post
    You can try Mason. Maybe that is what you need.
    Thanks for the URL to the software and the paper.

    Leave a comment:


  • howen
    replied
    You can try Mason. Maybe that is what you need.

    Leave a comment:


  • abolia
    replied
    Yup, got you..

    Thanks for the help.

    Ashini.

    Leave a comment:


  • Brian Bushnell
    replied
    The coverage distribution would look like Colorado/New Mexico, flat with plateaus:



    But without the scree slopes at the bottom.

    Leave a comment:


  • abolia
    replied
    Hi Brian,

    One more last question, I have this idea and just want to confirm this with someone knowledgeable.

    How about if I use getfasta to extract regions (given in the BED file) from my rearranged fasta file (lets call fasta file 1) and then use randomreads.sh to generate reads on that chopped fasta file (lets call fasta file 2, i.e. output of getfasta).
    In next step, simulatenously generate more random reads from the original fasta file 1.
    Now merge the two fastq files that you get from these 2 steps. So basically we are throwing some additional reads so that distribution can become more gaussian like rather than flat one.

    Does this make sense? Do you think it might work or am I missing something here.

    Thanks again,
    Ashini.

    Leave a comment:


  • abolia
    replied
    Thanks Brian,
    Yeah I don't think there is anything like that either. I will give it a try and try to do it myself.

    Thanks for your reply.
    Ashini.

    Leave a comment:


  • Brian Bushnell
    replied
    The distribution would be relatively flat. I don't have any tools that can simulate read generation from baited regions, and I don't know of any. It sounds like something you would have to write yourself. But, RandomReads generates reads annotated by their genomic origin, so it's possible to generate a flat distribution with it, then postprocess them and randomly discard reads with a probability based on the distance from the center of the nearest bait to achieve your goal. It would take a bit of work, of course.

    Leave a comment:


  • abolia
    replied
    Thanks GenoMax, your answers looks good. But won't this generate a very uniform distribution around the regions specified in the BED file. I want to have more of gaussian kinda distribution for my reads. Any thoughts if it can do this?

    Thanks,
    Ashini.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by abolia View Post
    Hi all,
    I want to generate reads for one of my rearranged genomes using a list of genomic coordinates in a BED file. The whole idea is to generate random DNA fragments from designated target regions.

    I tried using Wessim simulator, but it throws me bunch of errors and when I tried contacting their team, they said they no longer work on that project.

    Does anyone has any idea how to do this? Any help would be really great.

    Thanks,
    Ashini.
    If I understand this right ...

    You could use getfasta from BedTools (http://bedtools.readthedocs.org/en/l.../getfasta.html) to extract regions that you are interested in as fasta. Then use @Brian's randomreads.sh program.

    Leave a comment:


  • abolia
    replied
    Hi all,
    I want to generate reads for one of my rearranged genomes using a list of genomic coordinates in a BED file. The whole idea is to generate random DNA fragments from designated target regions.

    I tried using Wessim simulator, but it throws me bunch of errors and when I tried contacting their team, they said they no longer work on that project.

    Does anyone has any idea how to do this? Any help would be really great.

    Thanks,
    Ashini.

    Leave a comment:


  • cement_head
    replied
    Ok, thanks!

    Leave a comment:


  • Brian Bushnell
    replied
    This depends on exactly what you want to do with transcripts shorter than 250bp, but... there are 2 ways to do this:

    randomreads.sh ref=transcriptome.fa out=synth.fq.gz reads=100000 len=100 paired interleaved mininsert=250 maxinsert=250

    Or, if you shred it some way so you already have 250bp single-ended reads:

    bbfakereads.sh in=shreds.fq out=pairs.fq length=100

    I wrote that specifically for this purpose Incidentally, both of these commands will produce interleaved reads; you can convert between interleaved and dual-file paired, or between fasta and fastq, with reformat.sh, if you have things in the wrong format.

    The first command will only produce inserts of exactly 250bp, and the second will only produce inserts of exactly the length of the input sequences.

    Leave a comment:


  • cement_head
    replied
    (Hopefully) last question(s):

    I'd like to take a transcriptome (in the form of a file that contains all the FASTAs) - actually the ZF transcriptome and in silico fragment it into 250 bp "insert" sizes. Then I'd like to generate a pair of 100 bp PE reads from each fragment.

    In other words, if I have 100,000 fragments of 250 bp, I'd like to end up with 200,000 PE reads - one set corresponding to each of the 100,000 fragments. I know this is artificial, but because we're trying to check code, we'd like to be very defined and controlled in this first test.

    Thanks,
    Andor

    Leave a comment:


  • Brian Bushnell
    replied
    Incidentally, there's another tool that will do that too, Shred:

    shred.sh in=ref.fasta out=reads.fastq length=200


    The difference is that RandomReads will make reads in a random order from random locations, ensuring flat coverage on average, but it won't ensure 100% coverage unless you generate many fold depth. Shred, on the other hand, gives you exactly 1x depth and exactly 100% coverage (and is not capable of modelling errors). So, the use-cases are different.

    Leave a comment:


  • cement_head
    replied
    Originally posted by Brian Bushnell View Post
    Yes, it will. Any fasta is acceptable. You can't do anything regarding custom differential expression, though; it tries to generate a flat distribution.
    Great! That's perfect - I need to "shred" a transcriptome so that I can use it to test a tool I'm making for molecular indexing. Thanks

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
201 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
212 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-02-2024, 03:00 PM
0 responses
193 views
0 likes
Last Post seqadmin  
Working...
X