[ensembl-dev] Regulatory Features by CellType

Nathan Johnson njohnson at ebi.ac.uk
Wed Apr 13 09:59:51 BST 2011


Hi Fiona

Actually I think the problem you are having here is that you are using a very old version of the database. I assume you have read some recent documentation, or maybe gleened from the website that we now have a special 'MultiCell' build, constituting core regions across all the cell types. Originally we just had the one pan-cell type build, with a feature set called 'RegulatoryFeatures'. There are no cell types build in the v53 database.

Access is still as described below.

Thanks

Nath



On 12 Apr 2011, at 17:28, Daniel Sobral wrote:

> Hi Fiona,
> 
> To fetch Regulatory Features, you should use the Regulatory Feature Adaptor.
> You could use the fetch_all function, common to all adaptors
> 
> my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all()};
> 
> This can take a while, as the adaptor fetches all the data associated to all regulatory features.
> Moreover, for the moment it also fetches all Regulatory Features, irrespective of cell type.
> 
> You should probably consider dividing it by slices and use fetch_all_by_Slice.
> This has the further advantage of only fetching, by default, the global MultiCell regulatory features.
> Note that annotation (eg. Promoter-specific) is only cell-specific.
> 
> my @regulatory_features_slice = @{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)};
> 
> For each of the features you retrieve like this, you can then explore their cell-specific context.
> This includes cell-specific annotation, as well as extra boundaries deriving from cell-specific histone and polymerase data.
> 
> foreach my $rf (@rfs){
>    my @cell_specific_rfs = @{$regfeat_adaptor->fetch_all_by_stable_ID($rf->stable_id)};
>    foreach my $cell_specific_rf (@cell_specific_rfs){
>        #You could do the filtering here by feature_type->name
>        print $cell_specific_rf->stable_id."\t".$cell_specific_rf->cell_type->name."\t".$cell_specific_rf->feature_type->name."\n";
>    }
> }
> 
> If you want to filter by their annotation, for the moment you have to do it after fetching them all first.
> 
> 
> The basic use pattern that the API is currently centered on is to fetch MultiCell Regulatory Features, and then for each individual regulatory feature, to fetch specific cell-type information. It is not currently optimized to fetch data from individual cell lines, although it is still relatively easy to do it. We are actively considering what users actually want to do and try to design the API to make it easier and more efficient for them to do it.
> 
> Hope this helps.
> 
> Regards,
> Daniel
> 
> On 12/04/2011 13:45, Fiona Nielsen wrote:
>> Still digging into the Ensembl Regulatory Features...
>> 
>> I am trying to retrieve the regulatory features defined on/by multiple
>> cell lines, e.g. I am trying to retrieve a dataset similar to the
>> BioMart query of:
>> -
>> Homo sapiens features (GRCh37.p2)
>> Filters
>>  Feature Type : Gene Associated,Non-Gene Associated,Promoter
>> Associated,RegulatoryFeature,Unclassified
>>  Cell Type : MultiCell
>> Attributes
>>  Feature Set
>>  Feature Type
>>  Chromosome Name
>>  Start (bp)
>>  End (bp)
>>  Cell Type
>> -
>> 
>> However, the Feature Set Adaptor requires a CellType object to specify
>> the cell type, and the CellTypeAdaptor does not work with the name
>> 'MultiCell':
>> 
>>     my $ct_adaptor = $efg_db->get_CellTypeAdaptor();
>>     my $ct = $ct_adaptor->fetch_by_name('HeLa'); # does not work with
>> 'MultiCell'
>>     my @rf_fsets = @{$fset_adaptor->fetch_all_by_CellType($ct)};
>> 
>>     foreach my $rf_fset(@rf_fsets){
>>  	$returnstring .= $rf_fset->name.",";
>>     }
>> 
>> How then is the best way to retrieve the regulatory features from Cell
>> Type = 'MultiCell'?
>> Is there a function that returns all possible Cell Type names? (by the
>> hypothesis that these features might be named differently in the
>> database)
>> 
>> Next, if I want only a subset of the results above, e.g. only the
>> Feature Type = 'Promoter Associated', do I then have to sort through
>> the result myself, or is there another way to specify both of these
>> filters through the API?
>> 
>> 
>> All suggestions are appreciated,
>> 
>> Thanks,
>> -Fiona-
>> 
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
> 
> 
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

Nathan Johnson
Senior Scientific Programmer
Ensembl Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD

http://www.ensembl.info/
http://twitter.com/#!/ensembl










More information about the Dev mailing list