Header Leaderboard Ad


Cutadapt Demultiplex



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cutadapt Demultiplex

    Hi all,

    I'm working in R and trying to demultiplex some data using cutadapt

    I have three files:
    barcodes.fastq.gz (which I have converted to Fasta format in R)

    When I run the following code:

    system2(cutadapt, args = c("-e 1 -g file:Y:/Vault/Tary_meta/Kim_2019/barcodes.fasta -o trimmed-Y:/Vault/Tary_meta/Kim_2019/cutadapt/Demulti/{name}.1.fastq.gz -p trimmed-Y:/Vault/Tary_meta/Kim_2019/cutadapt/Demulti/{name}.2.fastq.gz input.1.forward.fastq.gz input.2.reverse.fastq.gz --action=none"))

    I first receive several warnings:
    WARNING: Adapter 'CTAAGAGCCCGA' (regular 5') was specified multiple times! Please make sure that this is what you want.

    and the following error
    OSError: [Errno 22] Invalid argument: 'trimmed-Y:/Vault/Tary_meta/Kim_2019/cutadapt/Demulti/M03190:41:000000000-BBDRY:1:1101:15702:1567.1.fastq.gz

    Unfortunately I can't seem to find this error when searching through the cutadapt manual and past forum posts. Does anyone know what this error means and how I might resolve this?

  • #2
    It looks like there might be a couple of issues with the command you're running:

    The warning message about the adapter being specified multiple times suggests that you might have multiple entries for the same adapter sequence in your barcodes.fasta file. You can check this by opening the file in a text editor and searching for the adapter sequence. If there are multiple entries, you should remove the duplicates so that each adapter sequence is only listed once.

    The error message about the invalid argument suggests that there might be an issue with the output file path that you've specified. Specifically, the error message seems to indicate that the file name for one of the output files is invalid (i.e., 'M03190:41:000000000-BBDRY:1:1101:15702:1567.1.fastq.gz'). One possibility is that there is a problem with the {name} placeholder that you're using in the output file path. Make sure that the {name} placeholder is being replaced with valid file names for each read pair.

    You might also want to try running the command without the --action=none option to see if there are any additional error messages that might provide more information about the problem.


    • #3
      Thanks for your answer! I have a few follow up questions if thats okay,

      On the adapter section, I forgot to add context earlier but this data 16S rRNA amplicon data, from a microbiome study so would multiple reads per adapter make sense in that case?

      Is there a way to check that the name place holder is being replaced with a valid filename? through cutadapt, and if I'm supposed to do this manually do you know where I can look to find the 'names' value in the barcodes file?

      Thanks heaps for your help!


      • #4
        dbro970 no problem. I'm happy to help when I can.

        For the adapter section, it is possible for there to be multiple reads that contain the same adapter sequence in 16S rRNA amplicon data, particularly if there is a high level of sample multiplexing (i.e. many different samples with unique barcodes). In this case, it may be appropriate to include the same adapter sequence multiple times in the adapter file, as Cutadapt will try to find the best match for each read independently.

        Regarding the use of the "{name}" placeholder in the output file name, you can check that this is being replaced with a valid filename by running Cutadapt with the "--debug" option. This will output debugging information to the console, including the filenames that are being used for output.

        For example, I would try to modify your command as:

        swiftCopy code
        system2(cutadapt, args = c("-e 1 -g file:Y:/Vault/Tary_meta/Kim_2019/barcodes.fasta -o trimmed-Y:\\Vault\\Tary_meta\\Kim_2019\\cutadapt\\Demulti\\{name}.1.fastq.gz -p trimmed-Y:\\Vault\\Tary_meta\\Kim_2019\\cutadapt\\Demulti\\{name}.2.fastq.gz input.1.forward.fastq.gz input.2.reverse.fastq.gz --action=none --debug"))

        I could be wrong, but I believe you should see output similar to the following:

        cssCopy code
        [debug] Writing to trimmed-Y:\Vault\Tary_meta\Kim_2019\cutadapt\Demulti\M03190_41_000000000-BBDRY_1_1101_15702_1567.1.fastq.gz and trimmed-Y:\Vault\Tary_meta\Kim_2019\cutadapt\Demulti\M03190_41_000000000-BBDRY_1_1101_15702_1567.2.fastq.gz
        This confirms that the "{name}" placeholder has been replaced with a valid filename for the given input read.

        And for finding the value of the "{name}" placeholder in the barcodes file, it will depend on how the barcodes file is formatted. Typically, the barcode sequences will be listed along with a sample name or ID in a tab-separated file or similar. You will need to extract the sample name or ID column and use that as the "{name}" placeholder in your Cutadapt command.

        Let me know how this all goes! Hopefully what I've written is helpful.


        • #5

          Sorry for letting this thread languish for so long. I ran the code with the debug and got this error:

          Traceback (most recent call last):
          File "script.py", line 2, in <module>
          File "cutadapt\__main__.py", line 1016, in main_cli
          File "cutadapt\__main__.py", line 1078, in main
          File "cutadapt\__main__.py", line 504, in open_output_files
          File "cutadapt\__main__.py", line 607, in open_demultiplex_out
          File "cutadapt\utils.py", line 204, in xopen
          File "cutadapt\utils.py", line 51, in open_raise_limit
          File "xopen\__init__.py", line 1075, in xopen
          File "xopen\__init__.py", line 958, in _open_gz
          File "gzip.py", line 58, in open
          File "gzip.py", line 173, in __init__
          OSError: [Errno 22] Invalid argument: 'trimmed-Y:/Vault/Tary_meta/Kim_2019/cutadapt/Demulti/M03190:41:000000000-BBDRY:1:1101:15702:1567.1.fastq.gz'
          [23228] Failed to execute script 'script' due to unhandled exception!​

          I don't really understand this output so I'm unsure what the next step would be to correct this?


          Latest Articles






          Topics Statistics Last Post
          Started by seqadmin, 05-26-2023, 09:22 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 05-24-2023, 09:49 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 05-23-2023, 07:14 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 05-18-2023, 11:36 AM
          0 responses
          Last Post seqadmin