{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7gJOUxL2Bqhk"
   },
   "source": [
    "# Overview\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kozo2/pywikipathways/blob/main/docs/pywikipathways_Overview.ipynb)\n",
    "\n",
    "**by Kozo Nishida and Alexander Pico**\n",
    "\n",
    "**pywikipathways 0.0.2**\n",
    "\n",
    "*WikiPathways* is a well-known repository for biological pathways that provides unique tools to the research community for content creation, editing and utilization [1].\n",
    "\n",
    "**Python** is a powerful programming language and environment for statistical and exploratory data analysis.\n",
    "\n",
    "*pywikipathways* leverages the WikiPathways API to communicate between **Python** and WikiPathways, allowing any pathway to be queried, interrogated and downloaded in both data and image formats. Queries are typically performed based on “Xrefs”, standardized identifiers for genes, proteins and metabolites. Once you can identified a pathway, you can use the WPID (WikiPathways identifier) to make additional queries.\n",
    "\n",
    "## Prerequisites\n",
    "All you need is this **pywikipathways** package!\n",
    "To install pywikipathways, run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pywikipathways"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "Qd_9CKFubDWI"
   },
   "outputs": [],
   "source": [
    "import pywikipathways as pwpw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ldQX1B-YC8E8"
   },
   "source": [
    "## Getting started\n",
    "Lets first get oriented with what WikiPathways contains. For example, here’s how you check to see which species are currently supported by WikiPathways:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "CCrccZGTbJEr"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Unspecified',\n",
       " 'Acetobacterium woodii',\n",
       " 'Anopheles gambiae',\n",
       " 'Arabidopsis thaliana',\n",
       " 'Bacillus subtilis',\n",
       " 'Beta vulgaris',\n",
       " 'Bos taurus',\n",
       " 'Caenorhabditis elegans',\n",
       " 'Canis familiaris',\n",
       " 'Clostridium thermocellum',\n",
       " 'Danio rerio',\n",
       " 'Daphnia magna',\n",
       " 'Daphnia pulex',\n",
       " 'Drosophila melanogaster',\n",
       " 'Escherichia coli',\n",
       " 'Equus caballus',\n",
       " 'Gallus gallus',\n",
       " 'Glycine max',\n",
       " 'Gibberella zeae',\n",
       " 'Homo sapiens',\n",
       " 'Hordeum vulgare',\n",
       " 'Mus musculus',\n",
       " 'Mycobacterium tuberculosis',\n",
       " 'Oryza sativa',\n",
       " 'Pan troglodytes',\n",
       " 'Populus trichocarpa',\n",
       " 'Rattus norvegicus',\n",
       " 'Saccharomyces cerevisiae',\n",
       " 'Solanum lycopersicum',\n",
       " 'Sus scrofa',\n",
       " 'Vitis vinifera',\n",
       " 'Xenopus tropicalis',\n",
       " 'Zea mays',\n",
       " 'Plasmodium falciparum',\n",
       " 'Brassica napus']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.list_organisms()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n9cznBF5DGil"
   },
   "source": [
    "You should see 30 or more species listed. This list is useful for subsequent queries that take an *organism* argument, to avoid misspelling.\n",
    "\n",
    "Next, let’s see how many pathways are available for Human:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "5vVJAMrCDMs4"
   },
   "outputs": [],
   "source": [
    "hs_pathways = pwpw.list_pathways('Homo sapiens')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "KZb5zLfHDTuA"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>species</th>\n",
       "      <th>revision</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>WP100</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Glutathione metabolism</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>107114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>WP106</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Alanine and aspartate metabolism</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>114258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>WP107</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Translation factors</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>117851</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>WP111</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Electron transport chain: OXPHOS system in mit...</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>117097</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>WP117</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>GPCRs, other</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>117743</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1327</th>\n",
       "      <td>WP734</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Serotonin receptor 4/6/7 and NR3C signaling</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>117826</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1328</th>\n",
       "      <td>WP75</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Toll-like receptor signaling pathway</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1329</th>\n",
       "      <td>WP78</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>TCA cycle (aka Krebs or citric acid cycle)</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1330</th>\n",
       "      <td>WP80</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Nucleotide GPCRs</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>111167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1331</th>\n",
       "      <td>WP98</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Prostaglandin synthesis and regulation</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>117172</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1332 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         id                                                url  \\\n",
       "0     WP100  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1     WP106  https://www.wikipathways.org/index.php/Pathway...   \n",
       "2     WP107  https://www.wikipathways.org/index.php/Pathway...   \n",
       "3     WP111  https://www.wikipathways.org/index.php/Pathway...   \n",
       "4     WP117  https://www.wikipathways.org/index.php/Pathway...   \n",
       "...     ...                                                ...   \n",
       "1327  WP734  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1328   WP75  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1329   WP78  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1330   WP80  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1331   WP98  https://www.wikipathways.org/index.php/Pathway...   \n",
       "\n",
       "                                                   name       species revision  \n",
       "0                                Glutathione metabolism  Homo sapiens   107114  \n",
       "1                      Alanine and aspartate metabolism  Homo sapiens   114258  \n",
       "2                                   Translation factors  Homo sapiens   117851  \n",
       "3     Electron transport chain: OXPHOS system in mit...  Homo sapiens   117097  \n",
       "4                                          GPCRs, other  Homo sapiens   117743  \n",
       "...                                                 ...           ...      ...  \n",
       "1327        Serotonin receptor 4/6/7 and NR3C signaling  Homo sapiens   117826  \n",
       "1328               Toll-like receptor signaling pathway  Homo sapiens   119233  \n",
       "1329         TCA cycle (aka Krebs or citric acid cycle)  Homo sapiens   119082  \n",
       "1330                                   Nucleotide GPCRs  Homo sapiens   111167  \n",
       "1331             Prostaglandin synthesis and regulation  Homo sapiens   117172  \n",
       "\n",
       "[1332 rows x 5 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hs_pathways"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KFl0WBXGDvzQ"
   },
   "source": [
    "Yikes! That is a lot of information.\n",
    "Let’s break that down a bit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "8sVREGDJRwDc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function list_pathways in module pywikipathways.list_pathways:\n",
      "\n",
      "list_pathways(organism='')\n",
      "    List Pathways\n",
      "    \n",
      "    Retrieve list of pathways per species, including WPID, name,\n",
      "    species, URL and latest revision number.\n",
      "    \n",
      "    Args:\n",
      "        organism (str): A particular species.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.DataFrame: A dataframe of pathway information.\n",
      "        \n",
      "    Examples:\n",
      "        >>> list_pathways('Mus musculus')\n",
      "            id  url     name    species revision\n",
      "        0       WP1     https://www.wikipathways.org/index.php/Pathway...       Statin pathway  Mus musculus    117947\n",
      "        1       WP10    https://www.wikipathways.org/index.php/Pathway...       IL-9 signaling pathway  Mus musculus    117067\n",
      "        2       WP103   https://www.wikipathways.org/index.php/Pathway...       Cholesterol biosynthesis        Mus musculus    116834\n",
      "        3       WP108   https://www.wikipathways.org/index.php/Pathway...       Selenium metabolism / selenoproteins    Mus musculus    117940\n",
      "        4       WP113   https://www.wikipathways.org/index.php/Pathway...       TGF-beta signaling pathway      Mus musculus    116497\n",
      "        ...     ...     ...     ...     ...     ...\n",
      "        230     WP79    https://www.wikipathways.org/index.php/Pathway...       Tryptophan metabolism   Mus musculus    104913\n",
      "        231     WP85    https://www.wikipathways.org/index.php/Pathway...       Focal adhesion  Mus musculus    116710\n",
      "        232     WP87    https://www.wikipathways.org/index.php/Pathway...       Nucleotide metabolism   Mus musculus    116529\n",
      "        233     WP88    https://www.wikipathways.org/index.php/Pathway...       Toll-like receptor signaling    Mus musculus    116521\n",
      "        234     WP93    https://www.wikipathways.org/index.php/Pathway...       IL-4 signaling pathway  Mus musculus    117991\n",
      "        235 rows × 5 columns\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.list_pathways)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "pCo4bCRYSJk_"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1332, 5)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hs_pathways.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "roldgmRmSbB3"
   },
   "source": [
    "Ok. The help docs tell us that for each Human pathway we are getting a lot of information.\n",
    "A *pandas.DataFrame.shape* might be all you really want to know.\n",
    "Or if you’re interested in just one particular piece of information, check out these functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "NWQeopT0Savj"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function list_pathway_ids in module pywikipathways.list_pathways:\n",
      "\n",
      "list_pathway_ids(organism='')\n",
      "    List Pathway WPIDs\n",
      "    \n",
      "    Retrieve list of pathway WPIDs per species.\n",
      "    Basically returns a subset of list_pathways result.\n",
      "    \n",
      "    Args:\n",
      "        organism (str): A particular species.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of WPIDs.\n",
      "        \n",
      "    Examples:\n",
      "        >>> list_pathway_ids('Mus musculus')\n",
      "        0        WP1\n",
      "        1       WP10\n",
      "        2      WP103\n",
      "        3      WP108\n",
      "        4      WP113\n",
      "               ...  \n",
      "        230     WP79\n",
      "        231     WP85\n",
      "        232     WP87\n",
      "        233     WP88\n",
      "        234     WP93\n",
      "        Name: id, Length: 235, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.list_pathway_ids)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "3tTl5WH6TBZm"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function list_pathway_names in module pywikipathways.list_pathways:\n",
      "\n",
      "list_pathway_names(organism='')\n",
      "    List Pathway Names\n",
      "    \n",
      "    Retrieve list of pathway names per species.\n",
      "    Basically returns a subset of list_pathways result.\n",
      "    \n",
      "    Args:\n",
      "        organism (str): A particular species.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of names.\n",
      "        \n",
      "    Examples:\n",
      "        >>> list_pathway_names('Mus musculus')\n",
      "        0                            Statin pathway\n",
      "        1                    IL-9 signaling pathway\n",
      "        2                  Cholesterol biosynthesis\n",
      "        3      Selenium metabolism / selenoproteins\n",
      "        4                TGF-beta signaling pathway\n",
      "                               ...                 \n",
      "        230                   Tryptophan metabolism\n",
      "        231                          Focal adhesion\n",
      "        232                   Nucleotide metabolism\n",
      "        233            Toll-like receptor signaling\n",
      "        234                  IL-4 signaling pathway\n",
      "        Name: name, Length: 235, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.list_pathway_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "Z9BAIHD4TBPp"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function list_pathway_urls in module pywikipathways.list_pathways:\n",
      "\n",
      "list_pathway_urls(organism='')\n",
      "    List Pathway URLs\n",
      "    \n",
      "    Retrieve list of pathway URLs per species.\n",
      "    Basically returns a subset of list_pathways result.\n",
      "    \n",
      "    Args:\n",
      "        organism (str): A particular species.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of URLs.\n",
      "        \n",
      "    Examples:\n",
      "        >>> list_pathway_urls('Mus musculus')\n",
      "        0      https://www.wikipathways.org/index.php/Pathway...\n",
      "        1      https://www.wikipathways.org/index.php/Pathway...\n",
      "        2      https://www.wikipathways.org/index.php/Pathway...\n",
      "        3      https://www.wikipathways.org/index.php/Pathway...\n",
      "        4      https://www.wikipathways.org/index.php/Pathway...\n",
      "                                     ...                        \n",
      "        230    https://www.wikipathways.org/index.php/Pathway...\n",
      "        231    https://www.wikipathways.org/index.php/Pathway...\n",
      "        232    https://www.wikipathways.org/index.php/Pathway...\n",
      "        233    https://www.wikipathways.org/index.php/Pathway...\n",
      "        234    https://www.wikipathways.org/index.php/Pathway...\n",
      "        Name: url, Length: 235, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.list_pathway_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-RcX6Am-TL4U"
   },
   "source": [
    "These return simple lists containing just a particular piece of information for each pathway result.\n",
    "\n",
    "Finally, there’s another way to find pathways of interest: by Xref. An Xref is simply a standardized identifier form an official source. WikiPathways relies on BridgeDb [2] to provide dozens of Xref sources for genes, proteins and metabolites. See the full list at https://github.com/bridgedb/datasources/blob/main/datasources.tsv\n",
    "\n",
    "With **pywikipathways**, the approach is simple.\n",
    "Take a supported identifier for a molecule of interest, e.g., an official gene symbol from HGNC, “TNF” and check the *system code* for the datasource, e.g., HGNC = H (this comes from the second column in the datasources.txt table linked to above), and then form your query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "TM2nCqiYD9Y6"
   },
   "outputs": [],
   "source": [
    "tnf_pathways = pwpw.find_pathways_by_xref('TNF','H')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "lElpOozlEFEz"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>score</th>\n",
       "      <th>id</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>species</th>\n",
       "      <th>revision</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.38982576</td>\n",
       "      <td>WP5071</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>PPAR-gamma pathway</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>116510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.38982576</td>\n",
       "      <td>WP5073</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>PPAR Beta/Delta pathway</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>115726</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.38982576</td>\n",
       "      <td>WP5115</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Network map of SARS-CoV-2 signaling pathway</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119638</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP176</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Folate metabolism</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>118404</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP2328</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Allograft Rejection</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>106557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP5088</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Prostaglandin signaling</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP5093</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Opioid receptor pathway annotation</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119684</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP5094</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Orexin receptor pathway</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP5098</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>T-cell activation SARS-CoV-2</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>0.27564844</td>\n",
       "      <td>WP2513</td>\n",
       "      <td>https://www.wikipathways.org/index.php/Pathway...</td>\n",
       "      <td>Nanoparticle triggered regulated necrosis</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>119820</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>85 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         score      id                                                url  \\\n",
       "0   0.38982576  WP5071  https://www.wikipathways.org/index.php/Pathway...   \n",
       "1   0.38982576  WP5073  https://www.wikipathways.org/index.php/Pathway...   \n",
       "2   0.38982576  WP5115  https://www.wikipathways.org/index.php/Pathway...   \n",
       "3   0.27564844   WP176  https://www.wikipathways.org/index.php/Pathway...   \n",
       "4   0.27564844  WP2328  https://www.wikipathways.org/index.php/Pathway...   \n",
       "..         ...     ...                                                ...   \n",
       "80  0.27564844  WP5088  https://www.wikipathways.org/index.php/Pathway...   \n",
       "81  0.27564844  WP5093  https://www.wikipathways.org/index.php/Pathway...   \n",
       "82  0.27564844  WP5094  https://www.wikipathways.org/index.php/Pathway...   \n",
       "83  0.27564844  WP5098  https://www.wikipathways.org/index.php/Pathway...   \n",
       "84  0.27564844  WP2513  https://www.wikipathways.org/index.php/Pathway...   \n",
       "\n",
       "                                           name       species revision  \n",
       "0                            PPAR-gamma pathway  Homo sapiens   116510  \n",
       "1                       PPAR Beta/Delta pathway  Homo sapiens   115726  \n",
       "2   Network map of SARS-CoV-2 signaling pathway  Homo sapiens   119638  \n",
       "3                             Folate metabolism  Homo sapiens   118404  \n",
       "4                           Allograft Rejection  Homo sapiens   106557  \n",
       "..                                          ...           ...      ...  \n",
       "80                      Prostaglandin signaling  Homo sapiens   119639  \n",
       "81           Opioid receptor pathway annotation  Homo sapiens   119684  \n",
       "82                      Orexin receptor pathway  Homo sapiens   119637  \n",
       "83                 T-cell activation SARS-CoV-2  Homo sapiens   119538  \n",
       "84    Nanoparticle triggered regulated necrosis  Homo sapiens   119820  \n",
       "\n",
       "[85 rows x 6 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tnf_pathways"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L73gzvkZEVhx"
   },
   "source": [
    "Ack! That’s a lot of information. We provide not only the pathway information, but also the search result score in case you want to rank results, etc. Again, if all you’re interested in is WPIDs, names or URLs, then there are these handy alternatives that will just return simple lists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "Yl_13aW8Dic9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function find_pathway_ids_by_xref in module pywikipathways.find_pathways_by_xref:\n",
      "\n",
      "find_pathway_ids_by_xref(identifier, system_code)\n",
      "    Find Pathway WPIDs By Xref\n",
      "    \n",
      "    Retrieve list of pathway WPIDs containing the query Xref by\n",
      "    identifier and system code.\n",
      "    \n",
      "    Note:\n",
      "        there will be multiple listings of the same pathway if the\n",
      "        Xref is present mutiple times.\n",
      "        \n",
      "    Args:\n",
      "        identifier (str): The official ID specified by a data source or system\n",
      "        system_code (str): The BridgeDb code associated with the data source or system,\n",
      "            e.g., En (Ensembl), L (Entrez), Ch (HMDB), etc. See column two of\n",
      "            https://github.com/bridgedb/datasources/blob/main/datasources.tsv.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of WPIDs.\n",
      "    \n",
      "    Examples:\n",
      "        >>> find_pathway_ids_by_xref('ENSG00000232810','En')\n",
      "        0     WP2813\n",
      "        1     WP4341\n",
      "        2     WP4673\n",
      "        3     WP1584\n",
      "        4     WP2571\n",
      "               ...  \n",
      "        82    WP5055\n",
      "        83    WP5093\n",
      "        84    WP5115\n",
      "        85    WP5094\n",
      "        86    WP5098\n",
      "        Name: id, Length: 87, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.find_pathway_ids_by_xref)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function find_pathway_names_by_xref in module pywikipathways.find_pathways_by_xref:\n",
      "\n",
      "find_pathway_names_by_xref(identifier, system_code)\n",
      "    Find Pathway Names By Xref\n",
      "    \n",
      "    Retrieve list of pathway names containing the query Xref by\n",
      "    identifier and system code.\n",
      "    \n",
      "    Note:\n",
      "        there will be multiple listings of the same pathway if the\n",
      "        Xref is present mutiple times.\n",
      "        \n",
      "    Args:\n",
      "        identifier (str): The official ID specified by a data source or system\n",
      "        system_code (str): The BridgeDb code associated with the data source or system,\n",
      "            e.g., En (Ensembl), L (Entrez), Ch (HMDB), etc. See column two of\n",
      "            https://github.com/bridgedb/datasources/blob/main/datasources.tsv.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of names.\n",
      "    \n",
      "    Examples:\n",
      "        >>> find_pathway_names_by_xref('ENSG00000232810','En')\n",
      "        0     Mammary gland development pathway - Embryonic ...\n",
      "        1       Non-genomic actions of 1,25 dihydroxyvitamin D3\n",
      "        2                                      Male infertility\n",
      "        3                             Type II diabetes mellitus\n",
      "        4                     Polycystic kidney disease pathway\n",
      "                                    ...                        \n",
      "        82                                   Burn wound healing\n",
      "        83                   Opioid receptor pathway annotation\n",
      "        84          Network map of SARS-CoV-2 signaling pathway\n",
      "        85                              Orexin receptor pathway\n",
      "        86                         T-cell activation SARS-CoV-2\n",
      "        Name: name, Length: 87, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.find_pathway_names_by_xref)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function find_pathway_urls_by_xref in module pywikipathways.find_pathways_by_xref:\n",
      "\n",
      "find_pathway_urls_by_xref(identifier, system_code)\n",
      "    Find Pathway URLs By Xref\n",
      "    \n",
      "    Retrieve list of pathway URLs containing the query Xref by\n",
      "    identifier and system code.\n",
      "    \n",
      "    Note:\n",
      "        there will be multiple listings of the same pathway if the\n",
      "        Xref is present mutiple times.\n",
      "        \n",
      "    Args:\n",
      "        identifier (str): The official ID specified by a data source or system\n",
      "        system_code (str): The BridgeDb code associated with the data source or system,\n",
      "            e.g., En (Ensembl), L (Entrez), Ch (HMDB), etc. See column two of\n",
      "            https://github.com/bridgedb/datasources/blob/main/datasources.tsv.\n",
      "    \n",
      "    Returns:\n",
      "        pandas.Series: A series of URLs.\n",
      "    \n",
      "    Examples:\n",
      "        >>> find_pathway_urls_by_xref('ENSG00000232810','En')\n",
      "        0     https://www.wikipathways.org/index.php/Pathway...\n",
      "        1     https://www.wikipathways.org/index.php/Pathway...\n",
      "        2     https://www.wikipathways.org/index.php/Pathway...\n",
      "        3     https://www.wikipathways.org/index.php/Pathway...\n",
      "        4     https://www.wikipathways.org/index.php/Pathway...\n",
      "                                    ...                        \n",
      "        82    https://www.wikipathways.org/index.php/Pathway...\n",
      "        83    https://www.wikipathways.org/index.php/Pathway...\n",
      "        84    https://www.wikipathways.org/index.php/Pathway...\n",
      "        85    https://www.wikipathways.org/index.php/Pathway...\n",
      "        86    https://www.wikipathways.org/index.php/Pathway...\n",
      "        Name: url, Length: 87, dtype: object\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(pwpw.find_pathway_urls_by_xref)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Be aware*: a simple *len* function may be misleading here since a given pathway will be listed multiple times if the Xref is present mutiple times."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ldozfBTpEkpU"
   },
   "source": [
    "## My favorite pathways\n",
    "At this point, we should have one or more pathways identified from the queries above. Let’s assume we identified ‘WP554’, the Ace Inhibitor Pathway (https://wikipathways.org/instance/WP554). We will use its WPID (WP554) in subsequent queries.\n",
    "\n",
    "First off, we can get information about the pathway (if we didn’t already collect it above):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "1LgCbfyREm0k"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': 'WP554',\n",
       " 'url': 'https://www.wikipathways.org/index.php/Pathway:WP554',\n",
       " 'name': 'ACE inhibitor pathway',\n",
       " 'species': 'Homo sapiens',\n",
       " 'revision': '118788'}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_pathway_info('WP554')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "g4mR-2DrE1i4"
   },
   "source": [
    "Next, we can get all the Xrefs contained in the pathway, mapped to a datasource of our choice. How convenient! We use the same system codes as described above. So, for example, if we want all the genes listed as Entrez Genes from this pathway:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "dvRV2-3gEmxZ"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['10159',\n",
       " '1215',\n",
       " '1511',\n",
       " '1585',\n",
       " '1636',\n",
       " '183',\n",
       " '185',\n",
       " '186',\n",
       " '3827',\n",
       " '4142',\n",
       " '4306',\n",
       " '4846',\n",
       " '59272',\n",
       " '5972',\n",
       " '623',\n",
       " '624',\n",
       " '7040']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_xref_list('WP554','L')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lwcDzKcEFCI2"
   },
   "source": [
    "Alternatively, if we want them listed as Ensembl IDs instead, then…"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "AVnWpFQsEms0"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ENSG00000092009',\n",
       " 'ENSG00000100448',\n",
       " 'ENSG00000100739',\n",
       " 'ENSG00000105329',\n",
       " 'ENSG00000113889',\n",
       " 'ENSG00000130234',\n",
       " 'ENSG00000130368',\n",
       " 'ENSG00000135744',\n",
       " 'ENSG00000143839',\n",
       " 'ENSG00000144891',\n",
       " 'ENSG00000151623',\n",
       " 'ENSG00000159640',\n",
       " 'ENSG00000164867',\n",
       " 'ENSG00000168398',\n",
       " 'ENSG00000179142',\n",
       " 'ENSG00000180772',\n",
       " 'ENSG00000182220']"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_xref_list('WP554', 'En')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EboEUtoIFKvK"
   },
   "source": [
    "And, if we want the metabolites, drugs and other small molecules associated with the pathways, then we’d simply provide the system code of a chemical database, e.g., Ch (HMBD), Ce (ChEBI) or Cs (Chemspider):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "id": "6WPfvXOEEmjv"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['HMDB0000016',\n",
       " 'HMDB0000037',\n",
       " 'HMDB0000464',\n",
       " 'HMDB0001035',\n",
       " 'HMDB00016',\n",
       " 'HMDB00037',\n",
       " 'HMDB0004246',\n",
       " 'HMDB00464',\n",
       " 'HMDB0061196',\n",
       " 'HMDB01035',\n",
       " 'HMDB04246',\n",
       " 'HMDB61196']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_xref_list('WP554', 'Ch')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "id": "bJ0JW2AEFUw3"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['16973',\n",
       " '2718',\n",
       " '2719',\n",
       " '27584',\n",
       " '29108',\n",
       " '3165',\n",
       " '35457',\n",
       " '55438',\n",
       " '80128',\n",
       " '80129',\n",
       " 'CHEBI:16973',\n",
       " 'CHEBI:2718',\n",
       " 'CHEBI:2719',\n",
       " 'CHEBI:27584',\n",
       " 'CHEBI:29108',\n",
       " 'CHEBI:3165',\n",
       " 'CHEBI:35457',\n",
       " 'CHEBI:55438',\n",
       " 'CHEBI:80128',\n",
       " 'CHEBI:80129']"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_xref_list('WP554', 'Ce')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "id": "qyBJrO-yFUgo"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['102770', '110354', '150504', '23150106', '24774738', '266', '388341', '5932']"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.get_xref_list('WP554', 'Cs')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "D24n2es3FVsj"
   },
   "source": [
    "It’s that easy!\n",
    "\n",
    "## Give me more\n",
    "We also provide methods for retrieving pathways as data files and as images. The native file format for WikiPathways is GPML, a custom XML specification. You can retrieve this format by…"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "gpml = pwpw.get_pathway('WP554')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<Pathway xmlns=\"http://pathvisio.org/GPML/2013a\" Name=\"ACE inhibitor pathway\" Organism=\"Homo sapiens\">\\n  <Comment Source=\"WikiPathways-description\">The core of this pathway was elucidated over a century ago and involves the conversion of angiotensinogen to angiotensin I (Ang I) by renin, its subsequent conversion to angiotensin II (Ang II) by angiotensin converting enzyme. Ang II activates the angiotensin II receptor type 1 to induce aldosterone synthesis, increasing water and salt resorption and potassium excretion in the kidney and increasing blood pressure.\\n\\nSource: PharmGKB (https://www.pharmgkb.org/pathway/PA2023)\\n\\nProteins on this pathway have targeted assays available via the [https://assays.cancer.gov/available_assays?wp_id=WP554 CPTAC Assay Portal]</Comment>\\n  <BiopaxRef>b93</BiopaxRef>\\n  <Graphics BoardWidth=\"991.0\" BoardHeight=\"651.0\"/>\\n  <DataNode TextLabel=\"NOS3\" GraphId=\"ab119\" Type=\"GeneProduct\">\\n    <Attribute Key=\"org.pathvisio.mo'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gpml[:1000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WikiPathways also provides a monthly data release archived at http://data.wikipathways.org. The archive includes GPML, GMT and SVG collections by organism and timestamped. There’s a Python function for grabbing files from the archive…"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "organism argument is not specified. Open http://data.wikipathways.org/current/gpml with your web browser and specify the organism.\n"
     ]
    }
   ],
   "source": [
    "pwpw.download_pathway_archive()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will simply print the archive URL so you can look around (in case you don’t know what you are looking for). By default, it prints the latest collection of GPML files. However, if you provide an organism, then it will download that file to your current working directory or specified **destpath**. For example, here’s how you’d get the latest GMT file for mouse:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'wikipathways-20210810-gmt-Mus_musculus.gmt'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.download_pathway_archive(organism=\"Mus musculus\", format=\"gmt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And if you might want to specify an archive date so that you can easily share and reproduce your script at any time in the future and get the same result. Remember, new pathways are being added to WikiPathways every month and existing pathways are improved continuously!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'wikipathways-20171010-gmt-Mus_musculus.gmt'"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwpw.download_pathway_archive(date=\"20171010\", organism=\"Mus musculus\", format=\"gmt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "1. Pico AR, Kelder T, Iersel MP van, Hanspers K, Conklin BR, Evelo C: **WikiPathways: Pathway editing for the people.** *PLoS Biol* 2008, **6**:e184+.\n",
    "\n",
    "2. Iersel M van, Pico A, Kelder T, Gao J, Ho I, Hanspers K, Conklin B, Evelo C: **The BridgeDb framework: Standardized access to gene, protein and metabolite identifier mapping services.** *BMC Bioinformatics* 2010, **11**:5+.\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "pywikipathways-Overview",
   "private_outputs": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}