Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • archie.chauhan
    replied
    thanks for the response. the pattern looks same in almost all the sample? Is this a problem with the library preparation or just sequencing run problem?

    I want to elaborate on my second question: (both R1 and R2)
    some samples have R1_001..to ...20.fastq.gz and some R1_001..to ...35.fastq.gz. Why different samples have different number of files? what does this suggest?

    Can you please let me know which software to use for clipping the adapter seq and the indices and further downstream processing

    thanks a lot sir!

    Leave a comment:


  • Heisman
    replied
    1. Having stretches of N's like that is not normal. I'm not sure what the cause would be. You should check with whatever sequencing core ran those samples to see if there was anything weird with the run as a whole. If so you may be able to get them to rerun it for free.

    2. No, your reads are 100bp. That's just the name of the file; could mean anything.

    Leave a comment:


  • archie.chauhan
    replied
    We got our illumina paired end data for 2x 100bp run processed from CASAVA 1.8 (demultiplexed fastq files). Since this is our very first run and we are a newbie to the downstream illumina data processing, I would appreciate if you can answer out queries:
    1). Our data for almost all the lanes looks as below. Is this normal? The position of NNNs is almost same in each sample from different lanes. If not, whats the cuse of such a data?

    ********************************************************************************************************
    @DJG64KN1:78:C0MG3ACXX:4:1101:1119:1986 1:Y:0:GCCAATA
    TTCTCCCCTTNNNNNNNNNNNTTCTTTGAACCCACNNNNNNNNTATCATGACTACTTATGTAANNNNNNNTACACAGCCACCATTTCTGANNNCTGCTCA
    +
    <<<?@???@@###########228???????????########--<=????????????;?@@#####################################
    @DJG64KN1:78:C0MG3ACXX:4:1101:1212:1989 1:Y:0:GCCAATA
    TATGAAAAATNNNNNNNNNNAATGTTATAATTTCTANGNNNNNGAGGGCTATTTATAGTCTAANNNNNTCAACTATGCTAATTATCACAATTAGCCCCTT
    +
    <<<@?@@@?@##########42@=@??@@?@?????#0#####00==????????>???@@@@#####,,9==>>?>???>>?????==========<<<
    @DJG64KN1:78:C0MG3ACXX:4:1101:1473:1987 1:Y:0:GCCAATA
    CTTACATATANNNNNNNNNNNAAAAGTAAGTTTGAGNCNNNNNTCCAATTTAGATGAAGAATCNNNNNACATTTCATATTTTTAATAGATACTTAACTAT
    +
    <<<@@@@@@@###########22==@>???@???)=#0#####00<????>???>??=?9;?;#####,,9==?>=>>>?????;=:===26;===<===
    @DJG64KN1:78:C0MG3ACXX:4:1101:1253:1997 1:Y:0:GCCAATA
    ATTTGTATTANNNNNNNNNTCAAAAATTAAGATGAGTATNNNNTGAAGTAAACATGATTTGGCNNNNNTGAAAACATAGACGAGATAGGAAAATAGAAAG
    +
    <<<@@@@@@@#########34=@@@@@@??@????????####00=??????????>?@@@@?#####--=???><>?>??<<<<<======<=======
    @DJG64KN1:78:C0MG3ACXX:4:1101:1385:1998 1:Y:0:GCCAATA
    AACCAAAGCTNNNNNNNNNAATTAAAGTCATTTCTCAACNNNNAGTATCAACATCTATACATANNNNNATTATCGATCAGTTATATAAAGTTCTTTTCTA
    +
    <<<@@@@@@@#########32@@@@?@????????????####00<=???????????@@@@>#####-,9=????=?<??????===============
    @DJG64KN1:78:C0MG3ACXX:4:1101:1667:1982 1:Y:0:GCCAATA
    ANGACTTAAGNNNNNNNNNNNTCCAGAGATAATTANNNNNNNNTTTTTTTCTTATTTATGAGNNNNNNNAACATCCAAAAAACTATTGTATTTTTGTGTC
    +
    <#0@@@@@@@###########22@>@>????????########00<????????????????######################################
    @DJG64KN1:78:C0MG3ACXX:4:1101:1519:1984 1:Y:0:GCCAATA
    TNCCCATTTTNNNNNNNNNNNCTTATTCACAAATCNNNNNNNNAACTTACAGTAGTTTTCATNNNNNNNAAAAACAGTTCAAACTGCAATTGTATTTGTG
    +
    9#0<@@(.@@##########################################################################################
    @DJG64KN1:78:C0MG3ACXX:4:1101:1594:1985 1:Y:0:GCCAATA
    TTATAATCAANNNNNNNNNNNAAAAAAAAAGCCCGNNNNNNNNAATTAAACATTGTTAAACCANNNNNNAACATTGTTAAACCAATAATAAGCAGTTATT
    +
    <<<@?@?@??###########22@@?????8>???#################################################################
    @DJG64KN1:78:C0MG3ACXX:4:1101:1644:1989 1:Y:0:GCCAATA
    AGATGAGTAANNNNNNNNNNTACATGCTCGAACGCTNTNNNNNGAGCAAATACGTTTTAAAACNNNNNAAGTTAAAACAACTTCTTGAAAATGAATCAAG
    +
    <<<@??@@@?##########32=?????????????#-#####.-<=??9;>??????@@???#####################################
    @DJG64KN1:78:C0MG3ACXX:4:1101:1809:1988 1:Y:0:GCCAATA
    TAGCCTTATCNNNNNNNNNNNCCAAACTAGACACCTNANNNNNCAACACTATGCCTTCTTTAANNNNNAAATGACATTTTTCCCAATTAAGAACAAGGTG
    *****************************************************************************************************************

    2): we have got around 21-30 fastq files per lane for both read 1 nad read 2 as: SJL-2b_ACAGTGA_L008_R1_001.fastq.gz ..................... SJL-2b_ACAGTGA_L008_R2_021.fastq.gz.
    Does this mean that the read length of this sample is only 21 bp?

    Leave a comment:


  • avm
    replied
    Thank you for explaining and for quick answers!!

    Leave a comment:


  • Heisman
    replied
    Originally posted by avm View Post
    Okey,

    I have reads of 75 bp but the mean insert size is in one sample 239 bp. Is the barcode that long?

    Thank you for making it more clear!
    No, that means (probably), that there is a 239 bp DNA strand between the two adapters that were ligated on. The insert size defined like this is independent of the read length.

    To explain better, some people I think state that the fragment size is the distance between where the adapters are ligated and the insert size is the distance between where the reads end in paired end sequencing. With that definition, the insert size would vary with read length. However, I do not believe that's universal, and I always refer to insert size as the total length of DNA between where the adapters are ligated.

    If you use software where you have to specify the insert size, check to make sure that you understand what definition the software is using.

    Leave a comment:


  • avm
    replied
    Okey,

    I have reads of 75 bp but the mean insert size is in one sample 239 bp. Is the barcode that long?

    Thank you for making it more clear!

    Leave a comment:


  • Heisman
    replied
    You should read through the stickies in the Illumina library prep section. That said:

    Read length and cycles are related terms. With Illumina, each base pair is sequenced one cycle at a time. So 100 cycles gives you 100bp reads. The read length is the number of bases sequenced. It does not matter where along the entire strand the bases are sequenced (although typically they start right after the sequencing primer which is ligated on to each fragment during the library prep).

    Insert refers to the sequence between the universal adapters that are ligated on. These adapters typically have the flow cell adapter sequence and sequencing primer. They can also have an index or a barcode. If they have a barcode, the barcode is considered to be included in the insert since it will be sequenced at the beginning of the read. An index would not be sequenced at the beginning of the read and thus does not contribute to the insert size.

    Fragment size is similar to insert size; the average size of your fragments. It should be specified if this is in regards to before or after ligating on the adapters.

    Leave a comment:


  • avm
    started a topic Read lengths, inserts, fragment size...

    Read lengths, inserts, fragment size...

    Hi,

    I am new in sequencing and bioinformatics and trying to get the terms right.

    I am doing WGS of e. coli genomes on Illumina HiSeq machines.

    Read length: Is that the length of the DNA fragment between the tags being replicated on the flow cell?

    Inserts: The actual sequence you get from the machine?

    Fragment size: Not sure...

    Cycles: I have samples run in 75 and 100 cycles, meaning what?

    Please help me, so confused...

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
66 views
0 likes
Last Post seqadmin  
Working...
X