pywikipathways and bridgedbpy

Open In Colab

by Kozo Nishida and Alexander Pico

pywikipathways 0.0.2
bridgedbpy 0.0.2

WikiPathways is a well-known repository for biological pathways that provides unique tools to the research community for content creation, editing and utilization [1].

Python is a powerful programming language and environment for statistical and exploratory data analysis.

pywikipathways leverages the WikiPathways API to communicate between Python and WikiPathways, allowing any pathway to be queried, interrogated and downloaded in both data and image formats. Queries are typically performed based on “Xrefs”, standardized identifiers for genes, proteins and metabolites. Once you can identified a pathway, you can use the WPID (WikiPathways identifier) to make additional queries.

bridgedbpy leverages the BridgeDb API [2] to provide a number of functions related to ID mapping and identifiers in general for gene, proteins and metabolites.

Together, bridgedbpy provides convience to the typical pywikipathways user by supplying formal names and codes defined by BridgeDb and used by WikiPathways.

Prerequisites

In addition to this pywikipathways package, you’ll also need to install bridgedbpy:

[ ]:
!pip install pywikipathways bridgedbpy
[1]:
import pywikipathways as pwpw
import bridgedbpy as brdgdbp

Getting started

Lets first check some of the most basic functions from each package. For example, here’s how you check to see which species are currently supported by WikiPathways:

[2]:
org_names = pwpw.list_organisms()
[3]:
org_names
[3]:
['Unspecified',
 'Acetobacterium woodii',
 'Anopheles gambiae',
 'Arabidopsis thaliana',
 'Bacillus subtilis',
 'Beta vulgaris',
 'Bos taurus',
 'Caenorhabditis elegans',
 'Canis familiaris',
 'Clostridium thermocellum',
 'Danio rerio',
 'Daphnia magna',
 'Daphnia pulex',
 'Drosophila melanogaster',
 'Escherichia coli',
 'Equus caballus',
 'Gallus gallus',
 'Glycine max',
 'Gibberella zeae',
 'Homo sapiens',
 'Hordeum vulgare',
 'Mus musculus',
 'Mycobacterium tuberculosis',
 'Oryza sativa',
 'Pan troglodytes',
 'Populus trichocarpa',
 'Rattus norvegicus',
 'Saccharomyces cerevisiae',
 'Solanum lycopersicum',
 'Sus scrofa',
 'Vitis vinifera',
 'Xenopus tropicalis',
 'Zea mays',
 'Plasmodium falciparum',
 'Brassica napus']

You should see 30 or more species listed. This list is useful for subsequent queries that take an organism argument, to avoid misspelling.

However, some function want the organism code, rather than the full name. Using bridgedbpy’s getOrganismCode function, we can get those:

[5]:
org_names[14]
[5]:
'Escherichia coli'
[6]:
brdgdbp.get_organism_code(org_names[14])
[6]:
'Ec'

Identifier System Names and Codes

Even more obscure are the various datasources providing official identifiers and how they are named and coded. Fortunately, BridgeDb defines these clearly and simply. And WikiPathways relies on these BridgeDb definitions.

For example, this is how we find the system code for Ensembl:

[7]:
brdgdbp.get_system_code("Ensembl")
[7]:
'En'

It’s “En”! That’s simple enough. But some are less obvious…

[8]:
brdgdbp.get_system_code("Entrez Gene")
[8]:
'L'

It’s “L” because the resource used to be named “Locus Link”. Sigh… Don’t try to guess these codes. Use this function from BridgeDb (above) to get the correct code. By the way, all the systems supported by BridgeDb are here: https://github.com/bridgedb/datasources/blob/main/datasources.tsv

How to use bridgedbpy with pywikipathways

Here are some specific combo functions that are useful. They let you skip worrying about system codes altogether!

  1. Getting all the pathways containing the HGNC symbol “TNF”:

[9]:
tnf_pathways = pwpw.find_pathway_ids_by_xref('TNF', brdgdbp.get_system_code('HGNC'))
tnf_pathways
[9]:
0     WP5071
1     WP5073
2     WP5115
3      WP176
4     WP2328
       ...
80    WP5088
81    WP5093
82    WP5094
83    WP5098
84    WP2513
Name: id, Length: 85, dtype: object
  1. Getting all the genes from a pathway as Ensembl identifiers:

[10]:
pwpw.get_xref_list('WP554', brdgdbp.get_system_code('Ensembl'))
[10]:
['ENSG00000092009',
 'ENSG00000100448',
 'ENSG00000100739',
 'ENSG00000105329',
 'ENSG00000113889',
 'ENSG00000130234',
 'ENSG00000130368',
 'ENSG00000135744',
 'ENSG00000143839',
 'ENSG00000144891',
 'ENSG00000151623',
 'ENSG00000159640',
 'ENSG00000164867',
 'ENSG00000168398',
 'ENSG00000179142',
 'ENSG00000180772',
 'ENSG00000182220']
  1. Getting all the metabolites from a pathway as ChEBI identifiers:

[12]:
pwpw.get_xref_list('WP554', brdgdbp.get_system_code('ChEBI'))
[12]:
['16973',
 '2718',
 '2719',
 '27584',
 '29108',
 '3165',
 '35457',
 '55438',
 '80128',
 '80129',
 'CHEBI:16973',
 'CHEBI:2718',
 'CHEBI:2719',
 'CHEBI:27584',
 'CHEBI:29108',
 'CHEBI:3165',
 'CHEBI:35457',
 'CHEBI:55438',
 'CHEBI:80128',
 'CHEBI:80129']

Other tips

And if you ever find yourself with a system code, e.g., from a pywikipathways return result and you’re not sure what it is, then you can use this function:

[13]:
brdgdbp.get_full_name('Ce')
[13]:
'ChEBI'

References

  1. Pico AR, Kelder T, Iersel MP van, Hanspers K, Conklin BR, Evelo C: WikiPathways: Pathway editing for the people. PLoS Biol 2008, 6:e184+.

  2. Iersel M van, Pico A, Kelder T, Gao J, Ho I, Hanspers K, Conklin B, Evelo C: The BridgeDb framework: Standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 2010, 11:5+.