Add Clusters
Runs unsupervised clustering on paths and adds the cluster assignment as a new segment column. The resulting column can be used in any widget that accepts a segment column.
This is the programmatic equivalent of the Cluster Analysis widget. Use the widget for interactive exploration and add_clusters when you want to persist cluster labels in the eventstream for downstream analysis.
Usage
es.add_clusters(
segment_name="cluster",
features=[
{"metric": "length"},
{"metric": "duration"},
{"metric": "event_count", "metric_args": {"event": "purchase"}},
],
method="kmeans",
n_clusters=4,
scaler="minmax",
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
segment_name | str | required | Name of the new segment column for cluster labels. |
features | list[dict] | required | Metric configs used as clustering features. Each dict has metric and optional metric_args. See Path Metrics. |
method | str | "kmeans" | Clustering algorithm: "kmeans" or "hdbscan". |
scaler | str | None | "minmax" | Feature scaling: "minmax", "std", or None. |
n_clusters | int | None | None | Number of clusters. Required for k-means. |
min_cluster_size | int | None | None | Minimum cluster size for HDBSCAN (default: 5). |
nmf_k | int | None | None | If set, applies NMF to reduce features to nmf_k components before clustering. |
path_id_col | str | None | None | Override the path ID column. |
Cluster labels
K-means clusters are labeled cluster_0, cluster_1, etc. HDBSCAN assigns noise points the label noise.