Eventstream

Eventstream is the central object in hopscotch-analytics. It wraps your event data and exposes all widgets and data processors as methods.

Throughout this documentation, es refers to an Eventstream instance. All code examples assume you have created one as shown below.

Creating an Eventstream

By default, Eventstream expects columns named user_id, event, and timestamp. If your data uses different column names, pass a schema.

import pandas as pd
from hopscotch import Eventstream

df = pd.read_csv("events.csv")
es = Eventstream(df)

You can also pass a CSV path directly:

es = Eventstream("events.csv")

Expected data format

Each row in your DataFrame represents a single event. At minimum, you need a path identifier column, an event name column, and a timestamp column.

user_ideventtimestamp
user_1page_view2024-01-01 10:00:00
user_1add_to_cart2024-01-01 10:02:00
user_1purchase2024-01-01 10:05:00

Parameters

ParameterTypeDefaultDescription
dfDataFrame | strrequiredEvent data as a pandas DataFrame or a path to a CSV file.
schemadict | NoneNoneSchema configuration. See below.
prepareboolTrueWhen True, parses timestamps, casts categoricals, and sorts rows. Set to False if your DataFrame is already prepared.

Schema

The schema tells hopscotch which columns in your DataFrame correspond to paths, events, timestamps, and segments. Pass it as a dict to the schema parameter.

es = Eventstream(df, schema={
    "path_cols": ["user_id"],
    "event_cols": ["event"],
    "timestamp": "timestamp",
    "segment_cols": ["country", "plan"],
})
FieldDefaultDescription
path_cols["user_id"]Columns that identify a path. The first column is the primary path ID.
event_cols["event"]Columns that contain event names. The first column is the primary event column.
timestamp"timestamp"The timestamp column.
segment_cols[]Columns treated as segment dimensions, available in widgets and metrics.
custom_cols[]Extra columns to carry through without special treatment.

Sample dataset

hopscotch-analytics ships with a synthetic e-commerce dataset you can use to try the library without your own data. It contains six months of user sessions on a consumer electronics store, with several embedded behavioral patterns designed to showcase the analysis tools.

from hopscotch.datasets.ecom import load_ecom
from hopscotch import Eventstream

df = load_ecom()
es = Eventstream(df, schema={
    "path_cols": ["user_id", "session_id"],
    "segment_cols": [
        "platform",
        "acquisition_channel",
        "user_cohort",
        "user_age_segment",
    ],
})