Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • carmeyeii
    replied
    Hi!

    I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

    Any knowledge would be greatly appreciated!

    Thanks a lot,

    Carmen

    Leave a comment:


  • haomself
    replied
    GATK does work on SHRIMP2 produced SAM files from SOLID pair-end reads. Here are the steps:
    align with SHRIMP2 with the --single-best-mapping and --all-contigs flags.
    use picard to fix the Read Group
    run GATK.

    Leave a comment:


  • idonaldson
    replied
    For the sake of completeness there is PerM, which does handle paired end reads (as separate files). I stopped using it as it ignores any read containing a 'N'. It is not a gapped aligner. It is still being developed.

    Leave a comment:


  • colindaven
    replied
    SHRiMP 2.2.2 does seem to be an alternative.

    I can align ~60m SE 60bp exome reads to the human genome in about 10 hours using 47 threads. That gives it about a third of the runtime of novoalign-CS on this machine.

    We will be testing SNP calls from alignments in the next few weeks so I can't say anything yet.

    Leave a comment:


  • NestorNotabilis
    replied
    Originally posted by Zaag View Post
    1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

    2) I think the new version of BWA also stopped supporting colour space.
    Thanks for that! I've checked and you're right - as of release 0.6, BWA has dropped colour-space support (although the online documentation still alludes to it). This is very disappointing, though given what I've been hearing, not unexpected

    Have checked Lifescope 2.5 output with GATK, and yes, it appears to work fine with GATK. This is very good news, so thanks for bringing that to my attention!

    Leave a comment:


  • Zaag
    replied
    1) GATK eats my Lifescope 2.5 BAM raw, I analyzed 2 whole genomes and some capture experiments without any of the pain I had with Bioscope and earlier versions of Lifescope.

    2) I think the new version of BWA also stopped supporting colour space.

    Leave a comment:


  • NestorNotabilis
    replied
    Originally posted by h2karen View Post
    With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
    http://solid.community.appliedbiosys...om/thread/1182
    Many thanks for the info - and yes I am aware of that issue - I'm afraid I wasn't being clear in the point that I was making, which is that, because ECC is now available, I fear that less effort will be made by third party developers to produce and maintain software capable of coping with colour-space data in addition to base-space. It is notable that Bowtie 2 cannot use colour-space even though Bowtie1 can and, it would seem from the above comment, BWA may also pull out of supporting colour-space. As far as using SOLiD paired-end data is concerned is this a bit of a double whammy: users can't use paired-end in base-space because of the point you rightly highlight, but if they use colour-space there are few (and, over time, it would seem fewer) options available to map the data easily.

    Leave a comment:


  • h2karen
    replied
    With regards to ECC with paired-end runs, I don't think it is currently possible. Seems like ECC analysis has only been customized for forward reads, and a mix of reverse and forward in the paired-end runs will make analysis difficult. Please see link below for complete response to this issue:
    Researchers use Applied Biosystems integrated systems for sequencing, flow cytometry, and real-time, digital and end point PCR—from sample prep to data analysis.

    Leave a comment:


  • NestorNotabilis
    replied
    Originally posted by colindaven View Post
    We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

    A few notes-

    1. Lifescope will be 10000 euros a year from what I've heard.

    2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

    3. There is some debate on the BWA mailing list on them dropping colour space altogether.

    Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

    Good point about SHRiMP2 - yes it does appear to have a paired-end option and I'll check it out - thanks for pointing that out (my memory of using Shrimp before is that it needed a lot of compute resources - specifically memory - it'll be interesting too see how it copes with paired-end data ).

    Am disappointed to hear that BWA may be dropping colour-space support altogether - though not entirely surprised - it has always appeared to perform particularly poorly with colour-space data data and I see there is some debate that this may be a flaw in their colour-space implementation. Further, with ECC technology now available with the 5500xl I wonder how many other developers might also drop colour-space support?

    I'm curious to know why you would avoid using Bowtie on WGS data - is it purely because of the indel issue? I've found bowtie mappings tend to give marginally better mappings than BWA - although the resultant BAM files are problematic when it comes to using GATK pipeline. I'm guessing you're suggesting it's suitability for RNA-seq because of TopHat?

    Lastly - yes... 10,000 euros a year for Lifescope. It is worth it? Hmm. Discuss!!! (I suspect the few of us who are ACTUALLY using LifeScope will seriously consider spending that money on alternative commercial options rather than fork out on what is a very fragile and poorly designed piece of software - however, it's fair to say it's mapper does do a good job of making the most of colour-space data... at least in terms of coverage...).

    Will look forward to what you have to say regarding Novoalign. Has anyone else any experience of using this mapper with SOLiD paired-end data?

    Leave a comment:


  • colindaven
    replied
    We can tell you about Novoalign after we receive it. Apparently it is the best aligner for SOLiD reads, according to even the authors of competing packages.

    A few notes-

    1. Lifescope will be 10000 euros a year from what I've heard.

    2. I wouldn't use Bowtie on genomic data, just for transcriptomes.

    3. There is some debate on the BWA mailing list on them dropping colour space altogether.

    Perhaps you missed Shrimp2 as well, though I am not sure if that can deal with paired end reads-

    Leave a comment:


  • NestorNotabilis
    started a topic Mapping SOLiD colorspace paired end reads

    Mapping SOLiD colorspace paired end reads

    Hi, I'm trying to get a sense of what the current consensus is regarding the best practice for mapping SOLiD colour-space paired-end reads. As far as I can tell the options appear to be rather limited:

    1. LifeScope.

    Advantages: designed specifically for paired-end colour-space reads and maps the most (which may also be a disadvantage… ). Disadvantages: Very slow and very resource hungry. Overly complicated command line interface. Can only run on a limited range of hardware/software. Difficult to leverage in to GATK. And to rub salt into the wound, AB are planning to charge for it in the near future.

    2. Bowtie.

    Advantages: Fast. Resource efficient. Multithreaded. Easy to use command line interface. Disadvantages: Tricky to leverage into GATK. Most importantly, Bowtie cannot accommodate indels. And Bowtie2 will not accept colour-space

    3. BWA

    Advantages: Fast. Efficient. Multithreaded. Accommodates indels . Easy to use interface. Easily leveraged into GATK pipeline (to a point - see below). BUT! NO current support for paired-end SOLiD data (Mate-pair yes, but not, it would seem, paired-end) - current workaround would appear to be to reverse F5 reads (and associated QVs), preferably trimming heavily at the 3' end to minimise potential problem of reversed error profile issues (though how much of an issue is this?). Creates all sorts of issues when leveraging into GATK (particularly as BWA does not include colour-space data in BAM, a prerequisite for GATK recalibration).

    4. BFAST

    Am still exploring this option -however would appear to accept paired-end colour-space data natively and can be leveraged into GATK pipelines but interface is a bit challenging, documentation a bit opaque, and parts of the process can be very slow.

    Is there anything I've missed out? What are people's preferred strategy, particularly if they want to leverage into GATK for recalibration purposes? Are we all going to be using ECC generated base-space data so development in colorspace-compatible tools will dry up? Any thoughts on this subject would be most welcome!

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:20 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
44 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Working...
X