Aggregation¶
Count¶
* | count()
event_id=1 | count()
Count with Field¶
Count non-null values of a specific field, or use unique=true for distinct counting:
* | count(computer_name)
* | count(computer_name, unique=true)
Sum / Avg / Max / Min¶
* | sum(bytes)
* | avg(response_time)
* | max(bytes)
* | min(response_time)
Percentile and Standard Deviation¶
* | percentile(response_time)
* | stdDev(response_time)
Returns p50, p75, p99 for percentile().
Median and MAD¶
* | median(response_time)
* | mad(response_time)
* | groupBy(computer_name, function=mad(latency)) | _mad > 50
median() returns the median value (_median). mad() returns both the median (_median) and the median absolute deviation (_mad), a robust measure of variability resistant to outliers. Unlike standard deviation, MAD is not skewed by extreme values, making it ideal for anomaly detection on noisy data.
Select First / Last¶
Return the value from the earliest or latest event in each group:
* | groupBy(user) | selectFirst(timestamp)
* | groupBy(user) | selectLast(timestamp)
Multiple Aggregations with Multi¶
* | multi(count(), avg(response_time), sum(bytes))
Count with Named Parameters¶
* | groupby(user) | multi(count(field=event_id, distinct=true, as=unique_events))
* | groupby(user) | multi(count(field=event_id, as=total))
Use distinct=true or unique=true for unique counts (uniqExact), and as= to name the output column.
Collect (groupArray)¶
* | groupby(user) | multi(collect(image))
Collects all values of a field into an array per group.
Top (Frequency Distribution)¶
* | groupby(user) | multi(top(field=event_id, percent=true, as=top_events))
Shows the top values with their frequency. Use percent=true to show percentages.
Skewness and Kurtosis¶
* | skewness(response_time)
* | kurtosis(response_time)
skewness() returns the population skewness (_skewness). kurtosis() returns the population excess kurtosis (_kurtosis). Both work on chained aggregation outputs:
* | groupby(computer_name, function=count()) | skewness(_count)
* | groupby(computer_name, function=count()) | kurtosis(_count)
Frequency¶
Build a frequency table with count, percentage, and cumulative percentage:
* | frequency(event_name)
* | frequency(status_code)
Returns value, _count, _percentage, and _cumulative_pct columns, sorted by count descending.
IQR (Interquartile Range)¶
* | iqr(response_time)
* | groupby(computer_name, function=count()) | iqr(_count)
Returns _q1 (25th percentile), _q3 (75th percentile), and _iqr (Q3 - Q1).
Head/Tail (Pareto Analysis)¶
Segment values into "head" and "tail" groups based on cumulative percentage (80/20 rule):
* | headTail(event_name)
* | headTail(computer_name, threshold=90)
Returns value, _count, _percentage, _cumulative_pct, and _segment (head or tail). Default threshold is 80%.
Modified Z-Score¶
Per-row modified z-score using median and MAD, robust to outliers:
* | modifiedZScore(response_time)
Returns _median, _mad, _modified_z for each row. Formula: 0.6745 * (x - median) / MAD.
MAD Outlier Detection¶
Modified z-score with an outlier flag:
* | madOutlier(response_time)
* | madOutlier(latency, threshold=2.5)
Returns _median, _mad, _modified_z, and _is_outlier (1 if |_modified_z| > threshold). Default threshold: 3.5. Filter outliers with:
* | madOutlier(response_time) | _is_outlier = 1
Analyze Fields¶
Compute statistics across all or selected fields:
* | analyzeFields()
* | analyzeFields(response_time, bytes, limit=100000)
Returns per-field stats: field_name, _events, _distinct_vals, _mean, _min, _max, _stdev. Default scan limit: 50,000 rows (max: 200,000, use limit=max for maximum).
Group By¶
* | groupBy(image)
* | groupBy(image, user)
The default aggregation function is count(). Use function= to specify a different aggregation:
* | groupBy(user, function=sum(bytes))
* | groupBy(user, function=avg(response_time))
Counting Groups¶
Piping count() after groupBy() counts the number of groups (not per-group counts). Combine with singleval() to display as a single metric:
* | groupBy(computer_name) | count() | singleval()
Distinct Count with groupBy¶
* | groupBy(computer_name, function=count(field=user, unique=true))
Multiple Aggregations with groupBy¶
Use function=multi(...) to compute multiple aggregations per group:
* | groupBy(computer_name, function=multi(count(computer_name), count(user, unique=true), sum(bytes)))