Python API Reference¶

This page documents the public Python interface. For a guided walkthrough, see the Python Tutorial.

PhytClust¶

The main class. Import it from the package root:

from phytclust import PhytClust

Constructor¶

pc = PhytClust(
    tree,                          # Bio.Phylo tree, Newick string, or file path
    min_cluster_size=None,         # hard lower bound on cluster size
    outlier_size_threshold=None,   # clusters below this → outlier (-1)
    prefer_fewer_outliers=False,   # prefer solutions with fewer outlier groups
    optimize_polytomies=True,      # use native polytomy DP (vs. legacy dummy nodes)
    polytomy_mode="hard",          # "hard" or "soft"
    soft_polytomy_max_degree=18,   # safety limit for soft mode
    no_split_zero_length=False,    # block splitting at zero-length edges
    runtime_config=None,           # RuntimeConfig for plot/save defaults
    peak_config=None,              # default PeakConfig (can be overridden per run)
)

`run(**kwargs)` → dict¶

The unified entry point. What it computes depends on which arguments you pass:

Call pattern	Mode	What you get
`pc.run(k=5)`	exact-k	One cluster map for exactly k groups
`pc.run(top_n=3, max_k=120)`	global	Top n peaks from the score curve
`pc.run(by_resolution=True, num_bins=4)`	resolution	One peak per log bin

You can pass peak_config=PeakConfig(...) to override peak detection settings for a single run.

Return value — a dictionary with these keys:

Key	Type	Description
`mode`	str	`"k"`, `"global"`, or `"resolution"`
`k_values`	list[int]	All selected k values
`selected_k`	int or None	The top-ranked k (first element of `k_values`), or `None` if no peaks found
`clusters`	dict or list[dict]	Leaf → cluster ID mapping. A single dict for exact-k, a list for multi-k modes
`scores`	numpy array or None	Score vector over all k values (None in exact-k mode)

Backward-compatible aliases

result["ks"] and result["peaks"] are aliases of result["k_values"]. result["k"] is present in exact-k mode. Prefer k_values and selected_k in new code.

`get_clusters(k)` → dict¶

Compute (or return cached) cluster assignments for a specific k. Returns a dict mapping leaf names to cluster IDs.

clusters = pc.get_clusters(k=7)
# {"leaf_A": 0, "leaf_B": 0, "leaf_C": 1, ...}

`plot(**kwargs)`¶

Generate score curve and/or tree plots. Key arguments:

pc.plot(
    results_dir="results/",  # where to save PNGs
    save=True,               # write to disk (vs. interactive display)
    dpi=150,                 # PNG resolution
)

Uses the RuntimeConfig set on the object for styling.

`save(**kwargs)`¶

Write cluster assignments and metadata to disk:

pc.save(
    results_dir="results/",
    filename="phytclust_results.tsv",
)

Configuration classes¶

All importable from the package root:

from phytclust import (
    PeakConfig,
    RuntimeConfig,
    PlotConfig,
    ScorePlotConfig,
    ClusterPlotConfig,
)

See the Configuration Reference for all fields and their defaults.

Visualization helpers¶

`plot_multiple_k`¶

Compare several k values side by side:

from phytclust.viz.cluster import plot_multiple_k

plot_multiple_k(
    pc,                           # PhytClust object (must have run at least once)
    k_values=[3, 5, 8],          # which k values to plot
    results_dir="results/multi",  # output directory
    save=True,
)

Exceptions¶

PhytClust defines a hierarchy of exceptions for clear error handling:

from phytclust.exceptions import (
    PhytClustError,          # base — catch this for "any PhytClust problem"
    ValidationError,         # bad input (tree format, missing leaves, etc.)
    ConfigurationError,      # invalid config values
    InvalidKError,           # k is out of range or infeasible
    ComputationError,        # DP or scoring failure
    InvalidClusteringError,  # result violates constraints
    MissingDPTableError,     # get_clusters() called before run()
    DataError,               # general data issues
    InvalidTreeError,        # tree can't be parsed or isn't rooted
)

In practice, you'll mostly encounter InvalidKError (e.g. asking for k=100 on a 50-leaf tree) and InvalidTreeError (e.g. passing an unrooted tree without --root-taxon).

Utilities¶

These are available for advanced use but aren't part of the main workflow.

Tree metrics¶

from phytclust.metrics.indices import (
    colless_index_calc,              # tree balance (Colless index)
    normalized_colless,              # normalized to [0, 1]
    calculate_internal_terminal_ratio,  # branch-length stemmy/tippy ratio
    calculate_variance_branch_length,   # variance of branch lengths
)

Phylogenetic diversity selection¶

from phytclust.selection.representatives import (
    maximize_pd,           # greedy selection maximizing phylogenetic diversity
    rank_terminal_nodes,   # rank leaves by distance metrics
)