Python API Reference¶
This page documents the public Python interface. For a guided walkthrough, see the Python Tutorial.
PhytClust¶
The main class. Import it from the package root:
from phytclust import PhytClust
Constructor¶
pc = PhytClust(
tree, # Bio.Phylo tree, Newick string, or file path
min_cluster_size=None, # hard lower bound on cluster size
outlier_size_threshold=None, # clusters below this → outlier (-1)
prefer_fewer_outliers=False, # prefer solutions with fewer outlier groups
optimize_polytomies=True, # use native polytomy DP (vs. legacy dummy nodes)
polytomy_mode="hard", # "hard" or "soft"
soft_polytomy_max_degree=18, # safety limit for soft mode
no_split_zero_length=False, # block splitting at zero-length edges
runtime_config=None, # RuntimeConfig for plot/save defaults
peak_config=None, # default PeakConfig (can be overridden per run)
)
run(**kwargs) → dict¶
The unified entry point. What it computes depends on which arguments you pass:
| Call pattern | Mode | What you get |
|---|---|---|
pc.run(k=5) |
exact-k | One cluster map for exactly k groups |
pc.run(top_n=3, max_k=120) |
global | Top n peaks from the score curve |
pc.run(by_resolution=True, num_bins=4) |
resolution | One peak per log bin |
You can pass peak_config=PeakConfig(...) to override peak detection settings for a single run.
Return value — a dictionary with these keys:
| Key | Type | Description |
|---|---|---|
mode |
str | "k", "global", or "resolution" |
k_values |
list[int] | All selected k values |
selected_k |
int or None | The top-ranked k (first element of k_values), or None if no peaks found |
clusters |
dict or list[dict] | Leaf → cluster ID mapping. A single dict for exact-k, a list for multi-k modes |
scores |
numpy array or None | Score vector over all k values (None in exact-k mode) |
Backward-compatible aliases
result["ks"] and result["peaks"] are aliases of result["k_values"]. result["k"] is present in exact-k mode. Prefer k_values and selected_k in new code.
get_clusters(k) → dict¶
Compute (or return cached) cluster assignments for a specific k. Returns a dict mapping leaf names to cluster IDs.
clusters = pc.get_clusters(k=7)
# {"leaf_A": 0, "leaf_B": 0, "leaf_C": 1, ...}
plot(**kwargs)¶
Generate score curve and/or tree plots. Key arguments:
pc.plot(
results_dir="results/", # where to save PNGs
save=True, # write to disk (vs. interactive display)
dpi=150, # PNG resolution
)
Uses the RuntimeConfig set on the object for styling.
save(**kwargs)¶
Write cluster assignments and metadata to disk:
pc.save(
results_dir="results/",
filename="phytclust_results.tsv",
)
Configuration classes¶
All importable from the package root:
from phytclust import (
PeakConfig,
RuntimeConfig,
PlotConfig,
ScorePlotConfig,
ClusterPlotConfig,
)
See the Configuration Reference for all fields and their defaults.
Visualization helpers¶
plot_multiple_k¶
Compare several k values side by side:
from phytclust.viz.cluster import plot_multiple_k
plot_multiple_k(
pc, # PhytClust object (must have run at least once)
k_values=[3, 5, 8], # which k values to plot
results_dir="results/multi", # output directory
save=True,
)
Exceptions¶
PhytClust defines a hierarchy of exceptions for clear error handling:
from phytclust.exceptions import (
PhytClustError, # base — catch this for "any PhytClust problem"
ValidationError, # bad input (tree format, missing leaves, etc.)
ConfigurationError, # invalid config values
InvalidKError, # k is out of range or infeasible
ComputationError, # DP or scoring failure
InvalidClusteringError, # result violates constraints
MissingDPTableError, # get_clusters() called before run()
DataError, # general data issues
InvalidTreeError, # tree can't be parsed or isn't rooted
)
In practice, you'll mostly encounter InvalidKError (e.g. asking for k=100 on a 50-leaf tree) and InvalidTreeError (e.g. passing an unrooted tree without --root-taxon).
Utilities¶
These are available for advanced use but aren't part of the main workflow.
Tree metrics¶
from phytclust.metrics.indices import (
colless_index_calc, # tree balance (Colless index)
normalized_colless, # normalized to [0, 1]
calculate_internal_terminal_ratio, # branch-length stemmy/tippy ratio
calculate_variance_branch_length, # variance of branch lengths
)
Phylogenetic diversity selection¶
from phytclust.selection.representatives import (
maximize_pd, # greedy selection maximizing phylogenetic diversity
rank_terminal_nodes, # rank leaves by distance metrics
)