Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cycomatto
    Junior Member
    • Dec 2011
    • 7

    sex/gender tags for BAM files?

    My analysis of next-generation sequencing data is gender / sex specific (should I consider the Y chromosome, or two X chromosomes? )

    I am unaware of any flag in .bam files (or samtools) for the gender of an individual (Male of Female). Is there a tag for gender that I don't know about?

    I am also unaware of gender consideraton in aligners such as BWA, etc. Do these aligners in fact consider gender when aligning?

    Finally, I couldn't find any gender / sex information on databases such as 1000 Genomes. Is it hidden away somewhere?
  • vivek_
    PhD Student
    • Jul 2012
    • 164

    #2
    I don't think aligner algorithms are gender specific and bam files do not have gender flags. The gender/relationship data is passed into NGS analysis using pedigree files.

    Comment

    • spacup
      Member
      • Apr 2013
      • 17

      #3
      Hi, I come back to this question for another reason:
      It's true that if you have reads from a female, you will have twice aligned reads on ChrX in comparison to a male... so for RNA-seq or ChIP-seq, this would introduce a biais in finding peaks or expressed transcripts, won't it?

      Second, to merge .bam files from male and female replicats with samtools, this generate an error of the type : different target sequence name: 'chrY' != 'chrM'
      because ChrY is absent from female .bam header....

      Any idea of how I could merge these files ? should only add chrY line in the header of the female.bam? is it sufficient? and How can I do? because .bam files are compressed binary files...

      Thanks for your help!
      Last edited by spacup; 12-02-2013, 03:00 AM.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        For downstream analysis, you would need to account for gender in your model fit (so "Counts ~gender + SomeFactor ..."), which is done by associating samples with factors. If you align to gender-specific genomes and need to subsequently merge files, then simply reheader the female samples and use a genome sorted such that chrY is last (that way you can simply swap in a new header without needing to modify any of the reads).
        Last edited by dpryan; 12-04-2013, 02:21 AM.

        Comment

        • spacup
          Member
          • Apr 2013
          • 17

          #5
          Thanks dpryan for your fast answer, the question is how to reheader the female sample since the .bam file are compressed binary files.... ?

          EDIT: ok, i found this :
          samtools reheader <in.header.sam> <in.bam>

          I didn't know this command, I will try it!
          Last edited by spacup; 12-02-2013, 03:23 AM.

          Comment

          • volks
            Member
            • Jun 2010
            • 80

            #6
            i see the need to account for gender in downstream analysis, but for what kind of data would it make a difference for the alignment (to include/not include chrY)?

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.

              Comment

              • spacup
                Member
                • Apr 2013
                • 17

                #8
                Hi, I see that my previous message has not been published...
                The idea is not to align data as when you have a .bam file, data are already mapped. I need to combine .bam file to perform peak calling.

                I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.

                By the way, the command samtools reheader <in.header.sam> <in.bam> worked, thanks!
                But my results are quite different from ENCODE ones...

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Just be very careful when you reheader a BAM file. If you change the order of the chromosomes then everything will be screwed up. If the order is such that chrY is last and everything else is the same, then things should work OK.

                  Comment

                  • volks
                    Member
                    • Jun 2010
                    • 80

                    #10
                    Originally posted by spacup View Post
                    I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.
                    i still dont understand. why not just merge the files as they are? why the reheader?

                    if the biological replicates are of different gender, maybe it would be best to exclude all reads mapping to chrX and Y. the inactive copy of chrX will be quite different from the active ones.

                    Comment

                    • volks
                      Member
                      • Jun 2010
                      • 80

                      #11
                      Originally posted by dpryan View Post
                      One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.
                      you would probably also create bias on other chromosomes, e.g. reads then mapping uniquely to chrX ..

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.

                        Comment

                        • volks
                          Member
                          • Jun 2010
                          • 80

                          #13
                          Originally posted by dpryan View Post
                          If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.
                          if there are only female samples in the whole study then i agree. otherwise the increased accuracy, which you couldnt achieve in male samples, might lead to false positive results?

                          so ENCODE performs/provides alignments of female samples without chrY?

                          Comment

                          • dpryan
                            Devon Ryan
                            • Jul 2011
                            • 3478

                            #14
                            In this case, the difference in mapping would be accounted for in the downstream statistics. If you wanted to directly compare the genders, though, then this would be an issue. I actually haven't a clue whether the original samples were aligned to a genome lacking chrY or not. From what spacup wrote, that's just my presumption.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Pathogen Surveillance with Advanced Genomic Tools
                              by seqadmin




                              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                              03-24-2025, 11:48 AM
                            • seqadmin
                              New Genomics Tools and Methods Shared at AGBT 2025
                              by seqadmin


                              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                              The Headliner
                              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                              03-03-2025, 01:39 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-20-2025, 05:03 AM
                            0 responses
                            41 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-19-2025, 07:27 AM
                            0 responses
                            49 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-18-2025, 12:50 PM
                            0 responses
                            36 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-03-2025, 01:15 PM
                            0 responses
                            192 views
                            0 reactions
                            Last Post seqadmin  
                            Working...