Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • robinvvelzen
    replied
    Originally posted by jollymrt View Post
    Yes, the commands have to be executed on mysql.
    first you need to select distinct rows in a different table(holdup table), delete all the rows in similarSequence table and then insert the rows from the holdup table. I hope this clears it
    Yes, it does. Thanks very much!

    In the meantime I had redone the whole orthoMCL pipeline and for some reason got no more duplicate errors . Possibly, some table entries were accidentally copied the last time. This just to inform others that may run into the same problems..

    Leave a comment:


  • jollymrt
    replied
    Yes, the commands have to be executed on mysql.
    first you need to select distinct rows in a different table(holdup table), delete all the rows in similarSequence table and then insert the rows from the holdup table. I hope this clears it

    Leave a comment:


  • robinvvelzen
    replied
    I am also having the same errors as the OP.

    Apart from trying ways to fix it I am wondering what causes the duplicate error in the orthoMCL pipeline. Given that duplicates are to be expected after an all-vs-all blast I would expect that the orthoMCL scripts would appropriately deal with them.

    Is this a matter of orthomclBlastParser, orthomclLoadBlast or orthomclPairs not doing a proper job?

    Originally posted by jollymrt View Post
    to check if you have duplicate entries use the following command

    select * from similarSequences group by query_id,subject_id having count(*)>1;

    this command will give you the rows that are duplicated.

    Then you can create a new table that will have only distinct rows.

    create table holdup as select distinct * from similarSequences;
    Thanks for the help but I have a few questions:
    1. Looking at those commands I assume these are to be executed within mysql, is that correct?
    2. Will they simply replace the similarSequences table with itself with duplicates removed, or do I need to do more to be able to continue the analysis?
    3. Will this (changing the table somewhere midway the orthoMCL pipeline) not compromise the analysis?

    Thanks again!

    Leave a comment:


  • jollymrt
    replied
    to check if you have duplicate entries use the following command

    select * from similarSequences group by query_id,subject_id having count(*)>1;

    this command will give you the rows that are duplicated.

    Then you can create a new table that will have only distinct rows.

    create table holdup as select distinct * from similarSequences;

    Leave a comment:


  • guyleonard
    replied
    Originally posted by jollymrt View Post
    the duplicate entry error can be removed by selecting only the distinct rows in the similarSequence table
    Never mind. There was a duplicate or two that I had missed. Sorting and using the tool 'uniq' works but you have to use the command -w and a number (I used 40) to limit the match to just the accession - sometimes the duplicates had different score values and so were effectively unique... Phew.

    Any chance you could expand on that?

    I have the same error as the OP, my file is 6.5GB.

    I've gone through the file and removed duplicates...or at least I thought I had.

    I managed to find a list of duplicate accessions and so removed them from similarsequences with AWK. I then also sorted the file on the first column and performed a uniq removal of any next neighbour duplicates...

    Every time the same error:
    Duplicate entry 'didi|DDB_G0279353-didi|DDB_G0283451' for key 'better_hit_ix' at /home/cs02gl/programs/orthomclSoftware-v2.0.3/bin/orthomclPairs line 693, <F> line 14.

    Looking at those accessions in the file (using grep) reveals no duplicates for that matching.
    Last edited by guyleonard; 01-07-2013, 03:39 AM.

    Leave a comment:


  • jollymrt
    replied
    the duplicate entry error can be removed by selecting only the distinct rows in the similarSequence table

    Leave a comment:


  • flipwell
    started a topic OrthoMCL duplicate entry error

    OrthoMCL duplicate entry error

    I have made it up to step 10 of the orthomcl process, finding protein pairs, but have become stuck with a duplicate entry error:

    Duplicate entry '5206|CNAG_00003-5206|CNE00390' for key 'better_hit_ix' at /data/ngs/apps/orthomclSoftware/bin/orthomclPairs line 693, <F> line 14

    I thought maybe it was something to do with the way I ran BLAST so I reran with parameters recommended but still get an error, although a different one:

    Duplicate entry '5206|CNAG_00006-5206--181-0-5206-5206|CNAG_00006' for key 'ss_qtaxexp_ix' at /data/ngs/apps/orthomclSoftware/bin/orthomclPairs line 693, <F> line 14

    Has anyone had a similar issue?

    Thanks for your help

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:24 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X