Add Clusters

Runs unsupervised clustering on paths and adds the cluster assignment as a new segment column. The resulting column can be used in any widget that accepts a segment column.

This is the programmatic equivalent of the Cluster Analysis widget. Use the widget for interactive exploration and add_clusters when you want to persist cluster labels in the eventstream for downstream analysis.

Usage

es.add_clusters(
    segment_name="cluster",
    features=[
        {"metric": "length"},
        {"metric": "duration"},
        {"metric": "event_count", "metric_args": {"event": "purchase"}},
    ],
    method="kmeans",
    n_clusters=4,
    scaler="minmax",
)

Parameters

Parameter	Type	Default	Description
`segment_name`	`str`	required	Name of the new segment column for cluster labels.
`features`	`list[dict]`	required	Metric configs used as clustering features. Each dict has `metric` and optional `metric_args`. See Path Metrics.
`method`	`str`	`"kmeans"`	Clustering algorithm: `"kmeans"` or `"hdbscan"`.
`scaler`	`str \| None`	`"minmax"`	Feature scaling: `"minmax"`, `"std"`, or `None`.
`n_clusters`	`int \| None`	`None`	Number of clusters. Required for k-means.
`min_cluster_size`	`int \| None`	`None`	Minimum cluster size for HDBSCAN (default: 5).
`nmf_k`	`int \| None`	`None`	If set, applies NMF to reduce features to `nmf_k` components before clustering.
`path_id_col`	`str \| None`	`None`	Override the path ID column.

Cluster labels

K-means clusters are labeled cluster_0, cluster_1, etc. HDBSCAN assigns noise points the label noise.