Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired Reads Mapping Across Chromosomes

    Hello all,

    I have some Illumina HiSeq paired end whole genome human data that has 2 - 3% of paired end reads where the two ends are mapping to different chromosomes. I'm using bwa with mostly default parameters, such as
    Code:
    bwa aln -t 16 -q 10 reference.fa r1.fastq > r1.sai
    ...
    bwa sampe reference.fa r1.sai r2.sai r1.fastq r2.fastq > r1.sam
    I also tried mapping some of the cross-chromosomal reads with bowtie as single ends and it placed them in the same location.

    The reads map uniquely with high quality (eg. 37 on phred scale) and seem to occur randomly with an even distribution across chromosomes. Bwa reports X0=1 and X1=0, indicating that there are no alternative mappings for the two ends (and even when I add flags to allow more edit distance and gap opens it still doesn't find any). The cross-chromosomal read pairs always seem to be single cases at a location which makes me think it can't be a biological effect.

    I'm wondering if anybody knows an explanation for these kind of reads and what the best way to treat them is?

    Many thanks!

    Simon

  • #2
    what about the other reads in the same regions? are they mapped specifically or do they have alternative locations?

    what is library? if its pcr derived that may be an issue.

    Comment


    • #3
      Hi,

      What about the other reads in the same regions?
      The other reads look fine; they're all mapped normally. Even the cross-chromosomal reads seem to be mapped well (as in, specifically and without mismatches, mostly). Even when there is a SNP through the normal reads it often is reflected in a read that has its pair on another chromosome while the other reads have normal pairs. So to me it seems like the reads are good but there is some sort of mixup in the pairing, but I don't know how it would happen.

      what is library? if its pcr derived that may be an issue
      There definitely was PCR in the prep, and we got varying rates of duplicates (there were multiple runs, they varied b/w 10 - 50% duplicate rate). Is it possible that PCR can mix up the ends of two reads? (Sorry, I don't know much about the biology side of this, I'm coming from a comp-sci background!).

      One theory we had was about whether it could be from the optics ... if the imaging could somehow confuse the ends of two molecules if they are close enough together?

      Cheers & Thanks!

      Simon

      Comment


      • #4
        i think illumina have done a pretty good job of reducing mix up of paired ends. i dont think we ever see any evidence of it. i would not expect any sequencing errors to be around 2-3% of reads.

        we have seen evidence of libraries made using long pcr having reads and read-pairs that jump from regions close to primer binding sites to other regions.

        Comment


        • #5
          Just wondering if you found out anything more about this issue as the data I am currently working on is giving the same result following BWA assembly.

          Thanks

          Comment


          • #6
            Hi flipwell,

            In the end some more experienced people informed us that this can occur as a natural result of the library prep process. I am fuzzy on the details, but at a high level, I think some fragments only get adapters on one end and then if those join together you can get what looks like a paired end fragment with adapters on each end that is actually a combination of two fragments from completely different genomic locations. I'm sure somebody more knowledgable around here could give a much more correct explanation.

            Cheers,

            Simon

            Comment


            • #7
              yes i would agree with the above post --and modify my previous post. we do seem to see a lot of chimeric reads and i expect most of them are chimeras from library prep. we see very variable distribution of the abundance of these reads between samples as well. some have almost none and others appear chock full but they always appear in the same clusters when mapped and at equal frequency in each orientation across "breakpoints". e.g. for ABCDE we get equal frequency of AD as DA.

              if anyone has any further information on the formation or identification of chimeric reads i'd really appreciate it as i still cling to the hope there is some biology in there.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 07-19-2024, 07:20 AM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-16-2024, 05:49 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-15-2024, 06:53 AM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Working...
              X