[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?
Yan P. Yuan
yuan at embl.de
Tue Jul 24 15:08:18 BST 2018
Hi Carla,
I meant that I've used the $slice->get_all_Genes in my other API scripts in the
context of a chromosome as the slice so that they'd run in parallel over all
chromosomes. What you mentioned is surely correct that
$genomic_align->get_Slice->get_all_Genes() is directly applied to the
corresponding genomic slice, rather to the whole chromosome.
OK, as I've interested in the orthologs' aligments, I'll try your mentioned method:
$genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
$slice)
to retrieve their genomic aligments.
Thaks a lot & regards
Yan
On 24/07/18 15:23, Carla Cummins wrote:
>
> Hi Yan,
>
> Just to clarify, the method I mentioned fetches all genes for the range of the
> genomic align object that you've called it on. So
> $genomic_align->get_Slice->get_all_Genes() fetches all genes located on the
> region covered by the $genomic_align, rather than the whole chromosome. A
> slice simply refers to a region of genomic sequence.
>
> Regarding fetching genomic sequences for homologies and expanding to
> surrounding sequence, it can be achieved like so (note: this a code snippet
> and should be merged with the homolgy2.pl example):
>
> my $all_homologies = $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss);
> foreach my $homology ( @$all_homologies ) {
> my $gene_members = $homology->get_all_Members();
> foreach my $gm ( @$gene_members ) {
> my $this_slice = $gm->get_Slice;
> # extend the slice by 100bp on 5' end and 200bp on 3' end
> my $expanded_slice = $this_slice->expand(100, 200);
> print $gm->display_label . "\t" . $expanded_slice->seq . "\n";
> }
> }
>
> Please note that this code will print the raw genomic sequence, not an aligned
> sequence. There is also a method on the GenomicAlignBlockAdaptor object that
> would allow you to fetch alignments for a given slice, meaning you could fetch
> LASTZ genomic_align_blocks for each homology member slice above:
> $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
> $slice);
>
>
> Hope this helps,
>
> Carla
>
> On 24/07/2018 13:41, Yan P. Yuan wrote:
>> Hi Carla,
>>
>> thank you for your reply.
>>
>> Yes, I'm actually interested more in the orthologs' CDS, sometimes also
>> beyond their borders. The method you've mentioned was used actually to fetch
>> all genes on a chromosome. Then after this, I'd then used the mentioned way
>> to retrieve their genomic alignments.
>>
>> The example "homology2.pl" you've shown can be used to retrieve all ortholog
>> pairs (esp. one-to-one's). I saw this example before, but couldn't get it to
>> retrieve their genomic sequences. Is there an access method in the object
>> "Homology" to retrieve their genomic alignments (like what I've shown
>> before), surrounding a certain region of a CDS?
>>
>> Best regards
>>
>> Yan
>>
>>
>> On 24/07/18 13:27, Carla Cummins wrote:
>>>
>>> Hi Yan,
>>>
>>> Since LASTZ alignments are on the whole genome, it's important to note that
>>> genes may not be present on every part of the sequence. To see which gene(s)
>>> are on a given aligned sequence region, you can use:
>>>
>>> $genomic_align->get_Slice->get_all_Genes();
>>>
>>> However, at the end of the message, you mention homologies. Homologies and
>>> whole genome alignments are really separate objects in our system. Inference
>>> of homologies is performed on a gene tree level and are not directly related
>>> to the whole genome alignments
>>> (http://www.ensembl.org/info/genome/compara/homology_types.html). Can you
>>> confirm whether it is truly homologies that you are interested in, or
>>> whether you are simply using the term to refer to well-aligned regions
>>> (regardless of genes)? Here is a basic example of how to retrieve
>>> human/mouse homologies using our API:
>>> https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl
>>>
>>> Finally, since it's clear that there is some relationship between homology
>>> and alignment (biologically speaking), we have a QC method for our
>>> homologies that tests the quality of the alignment over a pair of homologous
>>> genes (WGA score). This may be useful to you:
>>> http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html
>>>
>>> Hope this helps?
>>>
>>> Best,
>>>
>>> Carla
>>>
>>>
>>> On 24/07/2018 12:00, Yan P. Yuan wrote:
>>>> Hi,
>>>>
>>>> Currently, I'm interested in genomic aligments between 2 species: homo
>>>> sapience and mus musculus.
>>>>
>>>> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to
>>>> fetch the genomic alignments, the output misses the corresponding gene
>>>> names (or involved transcript names).
>>>>
>>>> Can someone point out how to get the gene names via the used methods? Or a
>>>> different way to do it?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Yan
>>>>
>>>> PS:
>>>>
>>>> The folllowing code was used:
>>>>
>>>> use strict;
>>>> use warnings;
>>>>
>>>> use Bio::EnsEMBL::Registry;
>>>> use Bio::AlignIO;
>>>>
>>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>>
>>>> $reg->load_registry_from_db(
>>>> -host=>'ensembldb.ensembl.org',
>>>> -port=>5306,
>>>> -user=>'anonymous',
>>>> );
>>>>
>>>>
>>>> my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>>>> "MethodLinkSpeciesSet");
>>>>
>>>> my $mlss =
>>>> $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>>>> ['human', 'mouse']);
>>>>
>>>> my $species_slice_adaptor = $reg->get_adaptor("mouse", "core",
>>>> "Slice");
>>>>
>>>> my $genomic_align_block_adaptor = $reg->get_adaptor("Multi", "compara",
>>>> "GenomicAlignBlock");
>>>>
>>>> # Define a region to fetch
>>>> my ($species_start, $species_end) = (143702058, 143702091);
>>>> my $species_slice =
>>>> $species_slice_adaptor->fetch_by_region("chromosome", 1,
>>>> $species_start, $species_end, -1);
>>>>
>>>> my $all_genomic_align_blocks =
>>>> $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>>>> $species_slice);
>>>>
>>>> my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>>>> -fh=>\*STDOUT,
>>>> -format=>'clustalw',
>>>> -idlength=>20);
>>>>
>>>> # print the restricted alignments
>>>> foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>>> my $restricted_gab =
>>>> $genomic_align_block->restrict_between_reference_positions($species_start,
>>>> $species_end);
>>>>
>>>> foreach my $genomic_align ( @{
>>>> $restricted_gab->get_all_GenomicAligns() } ) {
>>>> printf("%s %s:%d-%d:%s:%s\n", $genomic_align->genome_db->name,
>>>> $genomic_align->dnafrag->name, $genomic_align->dnafrag_start,
>>>> $genomic_align->dnafrag_end, $genomic_align->dnafrag_strand,
>>>> $genomic_align->aligned_sequence());
>>>> }
>>>> print "\n";
>>>> }
>>>>
>>>> The output is then:
>>>>
>>>> homo_sapiens 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>>>> mus_musculus 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>>>>
>>>>
>>>> One way to get to the gene name/transcripts I've used is GeneMember-Adaptor
>>>> & Homology-Adaptor, but then I'd miss the genomic alignments.
>>>>
>>>> This was taken from an ENSEMBL example:
>>>>
>>>> use strict;
>>>> use warnings;
>>>>
>>>> # using this script to get the ortholog of a given gene of mus musculus
>>>>
>>>> use Bio::EnsEMBL::Registry;
>>>>
>>>> my $reg = 'Bio::EnsEMBL::Registry';
>>>>
>>>> $reg->load_registry_from_db(
>>>> -host=>'ensembldb.ensembl.org',
>>>> -port=>5306,
>>>> -user=>'anonymous',
>>>> );
>>>>
>>>>
>>>> # Getting the gene_member_adaptor
>>>> my $gene_member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>>> 'compara', 'GeneMember');
>>>>
>>>> my $gene_member =
>>>> $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>>>>
>>>> # Getting the homology adaptor
>>>> my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>>> 'compara', 'Homology');
>>>>
>>>> my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
>>>> #homologies = hash array
>>>>
>>>> foreach my $homology ( @{$homologies} ) {
>>>> foreach my $member ( @{$homology->get_all_Members} ) {
>>>> if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>>>> print $member->description, "\n";
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>> Output:
>>>>
>>>> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033
>>>> End:114017102 Acc:RF00088
>>>> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663
>>>> End:62853732 Acc:RF00088
>>>>
>>>> Or is there a better way to get homology members only between 2 species?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing listDev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog:http://www.ensembl.info/
>>>
>>> --
>>> Carla Cummins, Ph.D.
>>> Ensembl Compara Developer
>>> European Bioinformatics Institute (EMBL-EBI),
>>> European Molecular Biology Laboratory,
>>> Wellcome Trust Genome Campus, Hinxton,
>>> Cambridge, CB10 1SD, United Kingdom
>>> Room A3-145
>>> Phone + 44 (0) 1223 49 4240
>>
>
> --
> Carla Cummins, Ph.D.
> Ensembl Compara Developer
> European Bioinformatics Institute (EMBL-EBI),
> European Molecular Biology Laboratory,
> Wellcome Trust Genome Campus, Hinxton,
> Cambridge, CB10 1SD, United Kingdom
> Room A3-145
> Phone + 44 (0) 1223 49 4240
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/ef159256/attachment.html>
More information about the Dev
mailing list