DBTOKEN — Interactive Demo

Text Tokenization

Text Mode

How non-numeric text is encoded. `concept` = whole-string tokens; `auto` picks per-class by cardinality.

Class Overrides

Force specific classes to the `Override mode` below regardless of `auto`.

Override Mode

The text mode applied to every class selected above.

Numeric Tokenization

How numeric values become tokens.

Numeric Type

`discrete` = quantile bins (Q-tokens); `continuous` = float value scaled by per-group distribution.

Numeric Sequence

`factored` = numeric as its own token; `fused` = concept+value merged (e.g. `hemoglobin::Q3`).

Bins

Number of quantile bins for discrete numerics (and time deltas in discrete mode).

Level Threshold

If a group has ≤ this many unique values, use categorical L-tokens instead of bins/scaling.

Clipping

Percentile clipping before binning/fitting. Protects against outliers.

Bin Clip Min %

Lower percentile edge for quantile binning.

Bin Clip Max %

Upper percentile edge for quantile binning.

Milestones & Clinical

Calendar/shift-boundary tokens emitted on change (Kalman-filter style).

IP/OP state machine (admission/discharge)

Builds state_transitions from HOSPITAL_ADMISSION → inpatient, HOSPITAL_DISCHARGE → outpatient.

Milestone (Inpatient)

Milestone granularity during inpatient state.

Milestone (Outpatient)

Milestone granularity during outpatient state.

Shift Start Hour

Hour-of-day (0–23) for shift boundary. Only used when a milestone is 8hr/12hr.

Time Scales (sec)

Comma-separated thresholds (sec) to split time-delta distributions. e.g. "86400" = under/over 24h. Leave blank for a single global distribution.

Scale Names

Optional comma-separated labels for each band (must be len(thresholds)+1). Leave blank for auto "0","1",…

Birth Date Class

Class name for birth-date demographic rows.

Birth Date Text Value

Text-value marker for birth-date rows (blank = none).

Display

Patient ID

Which patient's sequence to encode & visualize.

Show rows

Max debug-table rows.

Show tokens

Max token chips in the synchronized stream.

⇄ Side-by-Side Comparison

Run two configs on the same patient and compare output side-by-side.

▶ Configure Panel B settings

Full configuration for comparison panel B. Unspecified fields inherit from panel A.

text_mode

Class Overrides

Override Mode

Numeric Type

Numeric Sequence

Bins

Level Threshold

Bin Clip Min %

Bin Clip Max %

IP/OP state machine

Milestone (IP)

Milestone (OP)

Shift Start Hour

Time Scales (sec)

Scale Names

Birth Date Class

Birth Date Text Value

⚗️

Configure parameters and click Run Tokenizer to see the output.

Loading metadata…

DBTOKEN Interactive Demo