Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sex/gender tags for BAM files?

    My analysis of next-generation sequencing data is gender / sex specific (should I consider the Y chromosome, or two X chromosomes? )

    I am unaware of any flag in .bam files (or samtools) for the gender of an individual (Male of Female). Is there a tag for gender that I don't know about?

    I am also unaware of gender consideraton in aligners such as BWA, etc. Do these aligners in fact consider gender when aligning?

    Finally, I couldn't find any gender / sex information on databases such as 1000 Genomes. Is it hidden away somewhere?

  • #2
    I don't think aligner algorithms are gender specific and bam files do not have gender flags. The gender/relationship data is passed into NGS analysis using pedigree files.

    Comment


    • #3
      Hi, I come back to this question for another reason:
      It's true that if you have reads from a female, you will have twice aligned reads on ChrX in comparison to a male... so for RNA-seq or ChIP-seq, this would introduce a biais in finding peaks or expressed transcripts, won't it?

      Second, to merge .bam files from male and female replicats with samtools, this generate an error of the type : different target sequence name: 'chrY' != 'chrM'
      because ChrY is absent from female .bam header....

      Any idea of how I could merge these files ? should only add chrY line in the header of the female.bam? is it sufficient? and How can I do? because .bam files are compressed binary files...

      Thanks for your help!
      Last edited by spacup; 12-02-2013, 03:00 AM.

      Comment


      • #4
        For downstream analysis, you would need to account for gender in your model fit (so "Counts ~gender + SomeFactor ..."), which is done by associating samples with factors. If you align to gender-specific genomes and need to subsequently merge files, then simply reheader the female samples and use a genome sorted such that chrY is last (that way you can simply swap in a new header without needing to modify any of the reads).
        Last edited by dpryan; 12-04-2013, 02:21 AM.

        Comment


        • #5
          Thanks dpryan for your fast answer, the question is how to reheader the female sample since the .bam file are compressed binary files.... ?

          EDIT: ok, i found this :
          samtools reheader <in.header.sam> <in.bam>

          I didn't know this command, I will try it!
          Last edited by spacup; 12-02-2013, 03:23 AM.

          Comment


          • #6
            i see the need to account for gender in downstream analysis, but for what kind of data would it make a difference for the alignment (to include/not include chrY)?

            Comment


            • #7
              One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.

              Comment


              • #8
                Hi, I see that my previous message has not been published...
                The idea is not to align data as when you have a .bam file, data are already mapped. I need to combine .bam file to perform peak calling.

                I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.

                By the way, the command samtools reheader <in.header.sam> <in.bam> worked, thanks!
                But my results are quite different from ENCODE ones...

                Comment


                • #9
                  Just be very careful when you reheader a BAM file. If you change the order of the chromosomes then everything will be screwed up. If the order is such that chrY is last and everything else is the same, then things should work OK.

                  Comment


                  • #10
                    Originally posted by spacup View Post
                    I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.
                    i still dont understand. why not just merge the files as they are? why the reheader?

                    if the biological replicates are of different gender, maybe it would be best to exclude all reads mapping to chrX and Y. the inactive copy of chrX will be quite different from the active ones.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.
                      you would probably also create bias on other chromosomes, e.g. reads then mapping uniquely to chrX ..

                      Comment


                      • #12
                        If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.
                          if there are only female samples in the whole study then i agree. otherwise the increased accuracy, which you couldnt achieve in male samples, might lead to false positive results?

                          so ENCODE performs/provides alignments of female samples without chrY?

                          Comment


                          • #14
                            In this case, the difference in mapping would be accounted for in the downstream statistics. If you wanted to directly compare the genders, though, then this would be an issue. I actually haven't a clue whether the original samples were aligned to a genome lacking chrY or not. From what spacup wrote, that's just my presumption.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              The Impact of AI in Genomic Medicine
                              by seqadmin



                              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                              02-26-2024, 02:07 PM
                            • seqadmin
                              Multiomics Techniques Advancing Disease Research
                              by seqadmin


                              New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                              A major leap in the field has
                              ...
                              02-08-2024, 06:33 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 02-28-2024, 06:12 AM
                            0 responses
                            21 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 02-23-2024, 04:11 PM
                            0 responses
                            69 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 02-21-2024, 08:52 AM
                            0 responses
                            77 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 02-20-2024, 08:57 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X