Add Clusters

Runs unsupervised clustering on paths and adds the cluster assignment as a new segment column. The resulting column can be used in any widget that accepts a segment column.

This is the programmatic equivalent of the Cluster Analysis widget. Use the widget for interactive exploration and add_clusters when you want to persist cluster labels in the eventstream for downstream analysis.

Usage

es.add_clusters(
    segment_name="cluster",
    features=[
        {"metric": "length"},
        {"metric": "duration"},
        {"metric": "event_count", "metric_args": {"event": "purchase"}},
    ],
    method="kmeans",
    n_clusters=4,
    scaler="minmax",
)

Parameters

ParameterTypeDefaultDescription
segment_namestrrequiredName of the new segment column for cluster labels.
featureslist[dict]requiredMetric configs used as clustering features. Each dict has metric and optional metric_args. See Path Metrics.
methodstr"kmeans"Clustering algorithm: "kmeans" or "hdbscan".
scalerstr | None"minmax"Feature scaling: "minmax", "std", or None.
n_clustersint | NoneNoneNumber of clusters. Required for k-means.
min_cluster_sizeint | NoneNoneMinimum cluster size for HDBSCAN (default: 5).
nmf_kint | NoneNoneIf set, applies NMF to reduce features to nmf_k components before clustering.
path_id_colstr | NoneNoneOverride the path ID column.

Cluster labels

K-means clusters are labeled cluster_0, cluster_1, etc. HDBSCAN assigns noise points the label noise.