{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Basic Example of Using the Self-Organizing Map (SOM) Class\n", "==========================================================\n", "\n", "This script provides a straightforward example of how to use the SOM class to train and evaluate \n", "a Self-Organizing Map (SOM) with predetermined hyperparameters. The process includes calculating \n", "fit metrics and visualizing the results.\n", "\n", "Key Features:\n", "-------------\n", "1. Predetermined Hyperparameters:\n", " - The SOM is configured with a fixed set of hyperparameters, including scaling method, grid \n", " dimensions, topology, neighborhood function, and number of epochs.\n", "\n", "2. Fit Metric Calculation:\n", " - The Percent Variance Explained (PVE) and Topographic Error are calculated to evaluate the \n", " quality of the SOM fit.\n", "\n", "3. Visualization:\n", " - Component planes for individual features are generated to show their distribution across the \n", " SOM grid.\n", " - Categorical data distributions are visualized on the SOM grid to examine relationships \n", " between numerical and categorical variables.\n", "\n", "Considerations:\n", "---------------\n", "This script serves as a basic example and is not intended for rigorous analysis or hyperparameter \n", "tuning. For more complex workflows, such as automated hyperparameter optimization, refer to scripts \n", "specifically designed for grid search or other tuning techniques." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Standard imports\n", "import os\n", "import sys\n", "\n", "# Third party imports\n", "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "# Local imports\n", "notebook_dir = os.getcwd()\n", "parent_dir = os.path.abspath(os.path.join(notebook_dir, '..', '..'))\n", "sys.path.append(parent_dir)\n", "from SOM.utils.som_utils import SOM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Load Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load data\n", "train_dat_path = os.path.join(parent_dir, 'SOM', 'data', 'iris_training_data.csv')\n", "other_dat_path = os.path.join(parent_dir, 'SOM', 'data', 'iris_categorical_data.csv')\n", "\n", "train_dat = pd.read_csv(train_dat_path)\n", "other_dat = pd.read_csv(other_dat_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Train SOM" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Train SOM\n", "som = SOM(\n", " train_dat=train_dat,\n", " other_dat=other_dat,\n", " scale_method=\"minmax\",\n", " x_dim=5,\n", " y_dim=2,\n", " topology=\"hexagonal\",\n", " neighborhood_fnc=\"gaussian\",\n", " epochs=17,\n", ")\n", "\n", "# Train SOM Map\n", "som.train_map()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Get Fit Metrics" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Percent variance explained = 92.39063619063629%\n", "Topographic error = 0.5533333333333333\n" ] } ], "source": [ "# Get fit metrics\n", "pve = som.calculate_percent_variance_explained()\n", "topographic_error = som.calculate_topographic_error()\n", "\n", "print(f\"Percent variance explained = {pve}%\")\n", "print(f\"Topographic error = {topographic_error}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. Plot SOM's (see output directory for figures)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot component planes\n", "som.plot_component_planes(output_dir=\"output/iris\")\n", "\n", "# Plot SOM Map Using Categorical Data\n", "som.plot_categorical_data(output_dir=\"output/iris\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "chen5150", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.13" } }, "nbformat": 4, "nbformat_minor": 4 }