[ensembl-dev] VEP annotating same variant differently?

Will McLaren wm2 at ebi.ac.uk
Wed May 27 14:48:23 BST 2015


Hi Konrad,

Thanks for the bug report.

A bizarre one-in-a-million chance meant that the regulatory feature in
question was getting discarded due to a clash of internal identifiers. This
is now fixed on release/79 and release/80.

Will

On 26 May 2015 at 04:58, Konrad Karczewski <konradk at broadinstitute.org>
wrote:

> Hi Will, dev team,
>
> I've found what is appearing to be a strange issue (I always seem to find
> these corner cases) in VEP. Running the same VEP command twice on two
> files, one with ~1K variants (spanning all 22 chromosomes), and one with a
> single variant (that is included in the first file) appears to give a
> different result in the two runs for regulatory_region_variant annotations.
>
> The two annotated files are available at
> http://www.broadinstitute.org/~konradk/vep/
>
> Command line call in both cases (minus input filename):
>
> perl
> /humgen/atgu1/fs03/konradk/vep/ensembl-tools-release-79/scripts/variant_effect_predictor/
> variant_effect_predictor.pl --everything --vcf --allele_number --no_stats
> --cache --offline --dir /humgen/atgu1/fs03/konradk/vep/gold/
> --force_overwrite --cache_version 79 --fasta
> /tmp/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --assembly GRCh37
> --tabix --plugin
> LoF,human_ancestor_fa:/humgen/atgu1/fs03/konradk/loftee_data//human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:mysql
> -i /humgen/atgu1/fs03/konradk/lof/exac_subset.vcf.gz -o
> /humgen/atgu1/fs03/konradk/lof/exac_subset.vep.vcf.gz
>
> In both cases, it appears to have loaded the regulatory features, but then
> returns different results.
>
> Variant when run in a larger set:
>
> 2015-05-25 22:40:17 - Retrieved 369524 regulatory features (0 mem, 395468
> cached, 0 DB, 25944 duplicates)
>
> 1 78340517 . T C 652.97 PASS
> CSQ=C|intron_variant|MODIFIER|FAM73A|ENSG00000180488|Transcript|ENST00000443751|protein_coding||14/14|ENST00000443751.2:c.1570-14T>C|||||||rs540912776|1||1|HGNC|24741||||ENSP00000393675||F8W7S1_HUMAN|UPI000206500B|||||||||||||||||||||||,C|intron_variant|MODIFIER|FAM73A|ENSG00000180488|Transcript|ENST00000370791|protein_coding||15/15|ENST00000370791.3:c.1681-14T>C|||||||rs540912776|1||1|HGNC|24741|YES||CCDS681.1|ENSP00000359827|FA73A_HUMAN|R4GMP2_HUMAN&B7ZLZ8_HUMAN|UPI00000722C6|||||||||||||||||||||||
>
> Variant when run on its own:
>
> 2015-05-25 22:49:19 - Retrieved 372 regulatory features (0 mem, 372
> cached, 0 DB, 0 duplicates)
>
> 1       78340517        .       T       C       652.97  PASS
>  CSQ=C|intron_variant|MODIFIER|FAM73A|ENSG00000180488|Transcript|ENST00000443751|protein_coding||14/14|ENST00000443751.2:c.1570-14T>C|||||||rs540912776|1||1|HGNC|24741||||ENSP00000393675||F8W7S1_HUMAN|UPI000206500B|||||||||||||||||||||||,C|intron_variant|MODIFIER|FAM73A|ENSG00000180488|Transcript|ENST00000370791|protein_coding||15/15|ENST00000370791.3:c.1681-14T>C|||||||rs540912776|1||1|HGNC|24741|YES||CCDS681.1|ENSP00000359827|FA73A_HUMAN|R4GMP2_HUMAN&B7ZLZ8_HUMAN|UPI00000722C6|||||||||||||||||||||||,C|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000539328|promoter_flanking_region||||||||||rs540912776|1||||||||||||||||||||||||||||||||||
>
> -Konrad
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150527/484c8c28/attachment.html>


More information about the Dev mailing list