SOM-Seq¶
SOM-Seq is a Python toolbox that combines synthetic single-cell sequencing data generation (Seq_Sim) with Self-Organizing Map (SOM) clustering and visualization (SOM). The two modules can be used together or independently.
Installation¶
From source:
git clone https://github.com/caterer-z-t/SOM_Seq_Sim.git
cd SOM_Seq_Sim
pip install .
Quick Start¶
Generate Sequencing Data¶
seq-sim --num_samples 30 --fold_change 0.5 --config_file Seq_Sim/config.yml
Or equivalently:
python Seq_Sim/seq_sim.py \
--num_samples 30 \
--fold_change 0.5 \
--config_file Seq_Sim/config.yml
Output CSV files are written to the directory specified in config.yml (default: data/).
See Seq_Sim/config.yml for all configurable parameters (cell-type counts, batch structure, disease proportions, number of features, etc.).
Fit a SOM¶
Single hyperparameter set:
som-fit \
-t data/seq_sim_training_data.csv \
-c data/seq_sim_categorical_data.csv \
-o output/ \
-s zscore -x 5 -y 4 -p hexagonal -n gaussian -e 100
Hyperparameter tuning (pass multiple values; best combination is selected automatically):
som-fit \
-t data/seq_sim_training_data.csv \
-c data/seq_sim_categorical_data.csv \
-o output/ \
-s zscore minmax -x 3 5 7 -y 3 5 -p rectangular hexagonal -n gaussian -e 50 100
Or equivalently use python SOM/som.py with the same flags.
Python API:
import pandas as pd
from SOM.utils.som_utils import SOM
train = pd.read_csv("data/seq_sim_training_data.csv")
meta = pd.read_csv("data/seq_sim_categorical_data.csv")
som = SOM(
train_dat=train,
other_dat=meta,
scale_method="zscore",
x_dim=5,
y_dim=4,
topology="hexagonal",
neighborhood_fnc="gaussian",
epochs=100,
)
som.train_map()
print(f"PVE: {som.calculate_percent_variance_explained():.1f}%")
print(f"Topographic error: {som.calculate_topographic_error():.3f}")
som.plot_component_planes(output_dir="output/")
som.plot_categorical_data(output_dir="output/")
CLI Reference¶
seq-sim / python Seq_Sim/seq_sim.py¶
Flag |
Description |
|---|---|
|
Number of subjects to simulate |
|
Disease-associated fold change magnitude |
|
Path to |
som-fit / python SOM/som.py¶
Flag |
Description |
|---|---|
|
Path to training data CSV (numeric features) |
|
Path to categorical/metadata CSV (optional) |
|
Output directory for plots and metrics |
|
Scaling method: |
|
SOM x-dimension (one or more integers) |
|
SOM y-dimension (one or more integers) |
|
Topology: |
|
Neighborhood function: |
|
Number of training epochs (one or more integers) |
|
Generate component plane plots (default: |
When more than one value is provided for any hyperparameter, the CLI performs a grid search and selects the best combination by PVE − 100 × topographic_error.
Documentation¶
Full API reference and tutorials: som-seq-sim.readthedocs.io
Testing¶
pytest test/ -v
Contributing¶
See CONTRIBUTING.md. Please follow the Code of Conduct.
Citation¶
If you use SOM-Seq in your research, please cite:
Caterer Z., Pernat M., Hurd V. (2024). SOM-Seq: A Python Toolbox for Single-Cell Sequencing Simulation and Self-Organizing Map Analysis.
A machine-readable citation is available in CITATION.cff.
License¶
MIT — see LICENSE.