[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?
Carla Cummins
carlac at ebi.ac.uk
Tue Jul 24 14:23:15 BST 2018
Hi Yan,
Just to clarify, the method I mentioned fetches all genes for the range
of the genomic align object that you've called it on. So
$genomic_align->get_Slice->get_all_Genes() fetches all genes located on
the region covered by the $genomic_align, rather than the whole
chromosome. A slice simply refers to a region of genomic sequence.
Regarding fetching genomic sequences for homologies and expanding to
surrounding sequence, it can be achieved like so (note: this a code
snippet and should be merged with the homolgy2.pl example):
my $all_homologies = $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss);
foreach my $homology ( @$all_homologies ) {
my $gene_members = $homology->get_all_Members();
foreach my $gm ( @$gene_members ) {
my $this_slice = $gm->get_Slice;
# extend the slice by 100bp on 5' end and 200bp on 3' end
my $expanded_slice = $this_slice->expand(100, 200);
print $gm->display_label . "\t" . $expanded_slice->seq . "\n";
}
}
Please note that this code will print the raw genomic sequence, not an
aligned sequence. There is also a method on the GenomicAlignBlockAdaptor
object that would allow you to fetch alignments for a given slice,
meaning you could fetch LASTZ genomic_align_blocks for each homology
member slice above:
$genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
$slice);
Hope this helps,
Carla
On 24/07/2018 13:41, Yan P. Yuan wrote:
> Hi Carla,
>
> thank you for your reply.
>
> Yes, I'm actually interested more in the orthologs' CDS, sometimes
> also beyond their borders. The method you've mentioned was used
> actually to fetch all genes on a chromosome. Then after this, I'd then
> used the mentioned way to retrieve their genomic alignments.
>
> The example "homology2.pl" you've shown can be used to retrieve all
> ortholog pairs (esp. one-to-one's). I saw this example before, but
> couldn't get it to retrieve their genomic sequences. Is there an
> access method in the object "Homology" to retrieve their genomic
> alignments (like what I've shown before), surrounding a certain region
> of a CDS?
>
> Best regards
>
> Yan
>
>
> On 24/07/18 13:27, Carla Cummins wrote:
>>
>> Hi Yan,
>>
>> Since LASTZ alignments are on the whole genome, it's important to
>> note that genes may not be present on every part of the sequence. To
>> see which gene(s) are on a given aligned sequence region, you can use:
>>
>> $genomic_align->get_Slice->get_all_Genes();
>>
>> However, at the end of the message, you mention homologies.
>> Homologies and whole genome alignments are really separate objects in
>> our system. Inference of homologies is performed on a gene tree level
>> and are not directly related to the whole genome alignments
>> (http://www.ensembl.org/info/genome/compara/homology_types.html). Can
>> you confirm whether it is truly homologies that you are interested
>> in, or whether you are simply using the term to refer to well-aligned
>> regions (regardless of genes)? Here is a basic example of how to
>> retrieve human/mouse homologies using our API:
>> https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl
>>
>> Finally, since it's clear that there is some relationship between
>> homology and alignment (biologically speaking), we have a QC method
>> for our homologies that tests the quality of the alignment over a
>> pair of homologous genes (WGA score). This may be useful to you:
>> http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html
>>
>> Hope this helps?
>>
>> Best,
>>
>> Carla
>>
>>
>> On 24/07/2018 12:00, Yan P. Yuan wrote:
>>> Hi,
>>>
>>> Currently, I'm interested in genomic aligments between 2 species:
>>> homo sapience and mus musculus.
>>>
>>> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor
>>> to fetch the genomic alignments, the output misses the corresponding
>>> gene names (or involved transcript names).
>>>
>>> Can someone point out how to get the gene names via the used
>>> methods? Or a different way to do it?
>>>
>>> Thanks a lot.
>>>
>>> Yan
>>>
>>> PS:
>>>
>>> The folllowing code was used:
>>>
>>> use strict;
>>> use warnings;
>>>
>>> use Bio::EnsEMBL::Registry;
>>> use Bio::AlignIO;
>>>
>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>
>>> $reg->load_registry_from_db(
>>> -host=>'ensembldb.ensembl.org',
>>> -port=>5306,
>>> -user=>'anonymous',
>>> );
>>>
>>>
>>> my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>>> "MethodLinkSpeciesSet");
>>>
>>> my $mlss =
>>> $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>>> ['human', 'mouse']);
>>>
>>> my $species_slice_adaptor = $reg->get_adaptor("mouse",
>>> "core", "Slice");
>>>
>>> my $genomic_align_block_adaptor = $reg->get_adaptor("Multi",
>>> "compara", "GenomicAlignBlock");
>>>
>>> # Define a region to fetch
>>> my ($species_start, $species_end) = (143702058, 143702091);
>>> my $species_slice =
>>> $species_slice_adaptor->fetch_by_region("chromosome", 1,
>>> $species_start, $species_end, -1);
>>>
>>> my $all_genomic_align_blocks =
>>> $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>>> $species_slice);
>>>
>>> my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>>> -fh=>\*STDOUT,
>>> -format=>'clustalw',
>>> -idlength=>20);
>>>
>>> # print the restricted alignments
>>> foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>> my $restricted_gab =
>>> $genomic_align_block->restrict_between_reference_positions($species_start,
>>> $species_end);
>>>
>>> foreach my $genomic_align ( @{
>>> $restricted_gab->get_all_GenomicAligns() } ) {
>>> printf("%s %s:%d-%d:%s:%s\n",
>>> $genomic_align->genome_db->name, $genomic_align->dnafrag->name,
>>> $genomic_align->dnafrag_start, $genomic_align->dnafrag_end,
>>> $genomic_align->dnafrag_strand, $genomic_align->aligned_sequence());
>>> }
>>> print "\n";
>>> }
>>>
>>> The output is then:
>>>
>>> homo_sapiens
>>> 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>>> mus_musculus
>>> 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>>>
>>>
>>> One way to get to the gene name/transcripts I've used is
>>> GeneMember-Adaptor & Homology-Adaptor, but then I'd miss the genomic
>>> alignments.
>>>
>>> This was taken from an ENSEMBL example:
>>>
>>> use strict;
>>> use warnings;
>>>
>>> # using this script to get the ortholog of a given gene of mus
>>> musculus
>>>
>>> use Bio::EnsEMBL::Registry;
>>>
>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>
>>> $reg->load_registry_from_db(
>>> -host=>'ensembldb.ensembl.org',
>>> -port=>5306,
>>> -user=>'anonymous',
>>> );
>>>
>>>
>>> # Getting the gene_member_adaptor
>>> my $gene_member_adaptor =
>>> Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara',
>>> 'GeneMember');
>>>
>>> my $gene_member =
>>> $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>>>
>>> # Getting the homology adaptor
>>> my $homology_adaptor =
>>> Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Homology');
>>>
>>> my $homologies =
>>> $homology_adaptor->fetch_all_by_Member($gene_member);
>>> #homologies = hash array
>>>
>>> foreach my $homology ( @{$homologies} ) {
>>> foreach my $member ( @{$homology->get_all_Members} ) {
>>> if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>>> print $member->description, "\n";
>>> }
>>> }
>>> }
>>>
>>>
>>> Output:
>>>
>>> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X
>>> Start:114017033 End:114017102 Acc:RF00088
>>> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11
>>> Start:62853663 End:62853732 Acc:RF00088
>>>
>>> Or is there a better way to get homology members only between 2
>>> species?
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>
>> --
>> Carla Cummins, Ph.D.
>> Ensembl Compara Developer
>> European Bioinformatics Institute (EMBL-EBI),
>> European Molecular Biology Laboratory,
>> Wellcome Trust Genome Campus, Hinxton,
>> Cambridge, CB10 1SD, United Kingdom
>> Room A3-145
>> Phone + 44 (0) 1223 49 4240
>
--
Carla Cummins, Ph.D.
Ensembl Compara Developer
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, United Kingdom
Room A3-145
Phone + 44 (0) 1223 49 4240
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/20e58e59/attachment.html>
More information about the Dev
mailing list