Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    For K=3 and the input file:

    Code:
    >
    AAAAA
    You would want an output file:

    Code:
    >
    AAA
    >
    AAA
    >
    AAA
    Is that correct? I don't have anything that will do that; sorry. What did you want to use it for?

    Leave a comment:


  • jweger1988
    replied
    Hi Brian,

    I've been using kmercountexact and it's been very useful to give me the kmers with their counts.

    I'm wondering if any of your tools has the capability to give me a list of all of the kmers present at given length regardless of being unique or not. Basically a list that would also include the redundant kmers without counts.

    Thanks in advance for your help.

    Leave a comment:


  • jweger1988
    replied
    @HESmith, you weren't being a jerk at all. It's a fair point.

    I've created a new thread. http://seqanswers.com/forums/showthr...975#post207975. Thanks for any help you can provide.

    Leave a comment:


  • GenoMax
    replied
    @jweger1988: I second @HESmith's suggestion.

    Create a new thread with any errors or problems you have encountered with annotation programs.

    Much as we love BBMap suite there are always going to be things that you will need to use a different program for functionality not available in BBMap suite.

    Leave a comment:


  • HESmith
    replied
    Not trying to be a jerk, but have you tried variant calling with the test data that are included with the software? Success would eliminate installation and command errors as the source of your problem.

    Also, a general rule for online troubleshooting of software problems is 1) to provide the exact command syntax that your used, and 2) report the exact error message that you received.

    Leave a comment:


  • jweger1988
    replied
    @HESmith, thanks for the response.

    I've tried Annovar, SnpEff and a few others. Do you have any experience with these or other recommendations? It's totally possibly I'm just messing this up.

    I am working with a virus that is not in any of the database, I tried to build my own file in the database and was getting an error message about the file not having the correct chromosomes set (with SnpEff).

    Any clues?

    Leave a comment:


  • HESmith
    replied
    @jweger1988, what tools have you tried for VCF annotation and what problems have you encountered?

    Leave a comment:


  • jweger1988
    replied
    Hi Brian,

    Is there a tool in the suite that will take a .gff or other annotation file and annotate a .vcf from callvariants.sh? By annotation I mean adding the result of the variant (i.e. AA change). I've tried to do it with a bunch of other tools but I'm having trouble.

    I'm sure it's not currently supported, but is there any possibility that one could callvariants from a .gbk or .gff reference as opposed to a fasta to get the coding changes directly?

    Thanks in advance.

    Leave a comment:


  • darthsequencer
    replied
    Originally posted by Brian Bushnell View Post
    No, it doesn't change the behavior otherwise. But for clustering you may want to disable duplicate removal (or you won't get accurate cluster sizes) with "am=f ac=f" and you need to add some clustering flags, "fo c pc".

    So the command might look like:

    dedupe.sh in=file.fa out=file.fa mindentity=99 am=f ac=f fo c pc
    Hi I gave that a shot and dedupe throws a bunch of errors and produces about double the clusters than dedupe does without clustering.

    Here's a copy of log: https://www.dropbox.com/s/t8aqj7fgf2...er.fa.log?dl=0

    Leave a comment:


  • Brian Bushnell
    replied
    No, it doesn't change the behavior otherwise. But for clustering you may want to disable duplicate removal (or you won't get accurate cluster sizes) with "am=f ac=f" and you need to add some clustering flags, "fo c pc".

    So the command might look like:

    dedupe.sh in=file.fa out=file.fa mindentity=99 am=f ac=f fo c pc

    Leave a comment:


  • darthsequencer
    replied
    Originally posted by darthsequencer View Post
    Great! Yeah I'd like to know the duplicate membership for containments too.
    Hi Brian - I think "renamecluster=t" in dedupe.sh will do what I want with regards to tracking which contigs are greater than "minidentity".

    I run dedupe.sh like this: dedupe.sh in=file.fa out=file.fa mindentity=99

    Will adding "renameclusters=t" change what dedupe is doing besides renaming the sequences to what cluster they belong?

    Leave a comment:


  • sk8bro
    replied
    minid and idfilter

    Hi Brian,

    Thank you for making the deterministic flag. Results in that regard look great.

    I am confused by how minid and idfilter are working, however. I will email you a small dataset that recreates the 'unexpected' behavior I am encountering if that is helpful.

    My expectation is that this command should only allow alignments where BOTH the forward and reverse read, independently align with at least 75% pairwise identity to the reference.... ie overlap in the reads is NOT taken into account and a contig pairwise identity to the reference is NOT what the setting is referring to. Note: I usually output 4 fastqs for mapped/unmapped pairs but have changed to sam output and specified the idtag flag to give more info.
    Code:
    bbsplit.sh\
     -Xmx23g\
     averagepairdist=200\
     deterministic=t\
     k=8\
     minid=0.75\
     idfilter=0.75\
     maxindel=20\
     nzo=f\
     po=t\
     ambiguous2=split\
     ref=test_ref\
     idtag=t\
     in=Input_1P.fastq\
     in2=Input_2P.fastq\
     outm=Mapped.sam\
     outu=Unmapped.sam\
    Here is the head of Mapped.sam... how are these mapping? YI:f:73.10, YI:f:64.14 etc...
    Code:
    @HD     VN:1.4  SO:unsorted
    @SQ     SN:006_Koxy_tonB_PC     LN:317
    @SQ     SN:022_NDM_PC   LN:267
    @PG     ID:BBMap        PN:BBMap        VN:37.22        CL:java -Djava.library.path=/install/bbmap/jni/ -ea -Xmx23g align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 -Xmx23g averagepairdist=200 deterministic=t k=8 minid=0.75 idfilter=0.75 maxindel=20 nzo=f po=t ambiguous2=split ref=test_ref idtag=t in=Input_1P.fastq in2=Input_2P.fastq outm=Mapped.sam outu=Unmapped.sam
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 83      006_Koxy_tonB_PC        209     18      4=2X2=37I100=   =       6       -311    ATCAAGCTGTTTGCCGGGAATAACAACCAGCGCGGGGCGGCGTTTGACGTTACCGGGGCGCTGGACGATAACGATCGCGTGGCGGCGCGCTTAAGCGGCATGACCCGCTATGCAGACTCGCAGTTTGATACCTTAAAAGAGCAGC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:39 AM:i:18 YI:f:73.10
    gi|828959694|gb|CP011636.1|_Klebsiella_oxytoca_strain_CAV1374,_complete_genome_(reversed)-1-146 163     006_Koxy_tonB_PC        6       9       87=1X2=47I1X1=1X3=2X    =       209     311     AATGATGGGCGATACCAACTCGCACAGCTCGCTGGTGGTCGATCCGTGGTTCCTGGAAAATATCGAAGTGGTGCGCGGCCCGGCCTCAGTGCTGTACGGCCGCTCTTCGCCCGGCGGCATCGTCGCCCTCACCTCGCGTAAACCC  ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????  NM:i:52 AM:i:9  YI:f:64.14
    Thanks, Kate
    Last edited by GenoMax; 05-17-2017, 03:39 PM.

    Leave a comment:


  • GenoMax
    replied
    If you wanted to use dedupe with PE files you could do it this way
    Code:
    reformat.sh in1=R1.fq in2=R2.fq out=stdout.fq | dedupe.sh in=stdin.fq out=stdout.fq | reformat.sh in=stdin.fq out1=new1.fq out2=new2.fq
    Clumpify is more flexible compared to dedupe. You can select what happens to the duplicates. You may be interested in this option:
    Code:
    dedupe=t optical=f
    All duplicates are detected, whether optical or not.  All copies except one are removed for each duplicate.

    Leave a comment:


  • jweger1988
    replied
    Genomax,

    Awesome, changing CIGAR fixed the issue. Thanks.

    About Clumpify vs Dedupe, that's a good idea. I remember Brian saying somewhere that clumpify is not quite as good as Dedupe? What would be your recommended settings?

    Thanks

    Leave a comment:


  • GenoMax
    replied
    Not sure if this is applicable but BBMap produces SAM v.1.4 CIGAR strings by default. If your variant calling program expects v.1.3 format then you can either add sam=1.3 option when you align (or reformat.sh in=your.bam out=new.bam sam=1.3).

    Dedupe can be used with PE files but only if reads are interleaved. You may want to use clumpify instead of dedupe.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X