[ensembl-dev] variation api

Will McLaren wm2 at ebi.ac.uk
Fri Nov 26 14:25:31 GMT 2010


Hi Andrea

Answers inline:

On 26 November 2010 14:02, Andrea Edwards <edwardsa at cs.man.ac.uk> wrote:
> Hi
>
> Given a novel variation feature i was wondering how to find out if a dbSNP
> record existed for it. In other words, i was wondering if any known snps
> were colocated with it. I saw this code in wll mclaren's snp predictor
> program
>
>     # find any co-located existing VFs
>         my $existing_vf = "-";
>
>         if(defined($new_vf->adaptor->db) && $check_existing == 1) {
>             my $fs = $new_vf->feature_Slice;
>             if($fs->start > $fs->end) {
>                 ($fs->{'start'}, $fs->{'end'}) = ($fs->{'end'},
> $fs->{'start'});
>             }
>             foreach my
> $existing_vf_obj(@{$new_vf->adaptor->fetch_all_by_Slice($fs)}) {
>                 $existing_vf = $existing_vf_obj->variation_name
>                     if $existing_vf_obj->seq_region_start ==
> $new_vf->seq_region_start
>                     and $existing_vf_obj->seq_region_end ==
> $new_vf->seq_region_end;
>             }
>         }
>
>
> questions:
>
> 1. I believe the feature slice for a feature is  defined as the region
> exactly spanned by the feature. Does that mean its one bp for a snp?

Correct

>
> 2. Is it possible that the 'matching' dbsnp record could describe the snp as
> being on the opposite strand to my snp, or does ensembl do any internal
> procesing to make sure all snps are on the forward strand?

It is possible, but in most cases Ensembl changes the strand of all
SNPs to the forward strand. There are cases where this does not occur
- for example, if a SNP maps to two genomic locations (i.e. it has two
variation features), and one location is on the forward strand and the
other is on the reverse, then we can't do anything in this situation
without causing a mess with the SNP's alleles.

However, even if it was on the opposite strand, doing a
get_all_VariationFeatures (or fetch_all_by_Slice in this case) would
still return any relevant SNPs; you can just check the strand with the
seq_region_strand() method.

>
> 3. why might the feature slice start be greater than the feature slice end?
> I know this script is used for other variations than snps so it might have
> something to do with that perhaps?

Insertions are defined in Ensembl in this way; for an insertion of any
size between A and B, start = B and end = A (where B = A + 1).

>
> 4. Can i just confirm the variation_name will give the dbsnp id if it is a
> known snp. Could it give anything else? I presume the same snp can be held
> in multiple databases used as external references by ensembl.

If the SNP is from dbSNP then variation_name will give the dbSNP rsID.
For SNPs from other sources, IDs may differ. In the case of cow we
only have data from dbSNP anyway.

>
> 5. I must be reading this code wrong because it doesnt look to me as if it
> concatenates a list of dbsnp ids in the variable $existing_vf, it looks as
> though it is overwriting this variable with each snp found

Correct - there should only be one variation at each position, since
we merge any that have identical coordinates (both start and end must
be the same to be merged!)

Cheers

Will

>
>
> thanks a lot
>




More information about the Dev mailing list