__ ___
___ ___ ____ __ __/ /_/ (_)__ __________
/ _ \/ _ \______/ __ \/ / / / __/ / / _ \/ ___/ ___/
/ __/ __/_____/ /_/ / /_/ / /_/ / / __/ / (__ )
\___/\___/ \____/\__,_/\__/_/_/\___/_/ /____/
Open-source framework to detect outliers in Elasticsearch events
Developed by NVISO Labs (https://blog.nviso.be - https://twitter.com/NVISO_Labs)
Detecting beaconing TLS connections using ee-outliers
Introduction
ee-outliers is a framework to detect outliers in events stored in an Elasticsearch cluster. The framework was developed for the purpose of detecting anomalies in security events, however it could just as well be used for the detection of outliers in other types of data.
The framework makes use of statistical models that are easily defined by the user in a configuration file. In case the models detect an outlier, the relevant Elasticsearch events are enriched with additional outlier fields. These fields can then be dashboarded and visualized using the tools of your choice (Kibana or Grafana for example).
The possibilities of the type of anomalies you can spot using ee-outliers is virtually limitless. A few examples of types of outliers we have detected ourselves using ee-outliers during threat hunting activities include:
- Detect beaconing (DNS, TLS, HTTP, etc.)
- Detect geographical improbable activity
- Detect obfuscated & suspicious command execution
- Detect fileless malware execution
- Detect malicious authentication events
- Detect processes with suspicious outbound connectivity
- Detect malicious persistence mechanisms (scheduled tasks, auto-runs, etc.)
- …
Checkout the screenshots at the end of this readme for a few examples. Continue reading if you would like to get started with outlier detection in Elasticsearch yourself!
Core features
- Create your own custom outlier detection use cases specifically for your own needs
- Send automatic e-mail notifications in case one of your outlier use cases hit
- Automatic tagging of asset fields to quickly spot the most interesting assets to investigate
- Fine-grained control over which historical events are checked for outliers
- ...and much more!
Requirements
- Docker to build and run ee-outliers
- Internet connectivity to build ee-outliers (internet is not required to run ee-outliers)
Getting started
Configuring ee-outliers
ee-outliers makes use of a single configuration file, in which both technical parameters (such as connectivity with your Elasticsearch cluster, logging, etc.) are defined as well as the detection use cases.
An example configuration file with all required configuration sections and parameters, along with an explanation, can be found in defaults/outliers.conf
. We recommend starting from this file when running ee-outliers yourself.
Continue reading for all the details on how to define your own outlier detection use cases.
Running in interactive mode
In this mode, ee-outliers will run once and finish. This is the ideal run mode to use when testing ee-outliers straight from the command line.
Running ee-outliers in interactive mode:
# Build the image
docker build -t "outliers-dev" .
# Run the image
docker run --network=sensor_network -v "$PWD/defaults:/mappedvolumes/config" -i outliers-dev:latest python3 outliers.py interactive --config /mappedvolumes/config/outliers.conf
Running in daemon mode
In this mode, ee-outliers will continuously run based on a cron schedule defined in the outliers configuration file.
Example from the default configuration file which will run ee-outliers at 00:00 each night:
[daemon]
schedule=0 0 * * *
Running ee-outliers in daemon mode:
# Build the image
docker build -t "outliers-dev" .
# Run the image
docker run --network=sensor_network -v "$PWD/defaults:/mappedvolumes/config" -d outliers-dev:latest python3 outliers.py daemon --config /mappedvolumes/config/outliers.conf
Customizing your Docker run parameters
The following modifications might need to be made to the above commands for your specific situation:
- The name of the docker network through which the Elasticsearch cluster is reachable (
--network
) - The mapped volumes so that your configuration file can be found (
-v
). By default, the default configuration file in/defaults
is mapped to/mappedvolumes/config
- The path of the configuration file (
--config
)
Configuring outlier detection models
The different detection use cases can be configured in the configuration file, passed to ee-outliers. In this section we discuss all the different detection mechanisms that are available, and the options they provide to the analyst.
The different types of detection models that can be configured are listed below.
-
simplequery models: this model will simply run an Elasticsearch query and tag all the matching events as outliers. No additional statistical analysis is done. Example use case: tag all the events that contain the string "mimikatz" as outliers.
-
metrics models: the metrics model looks for outliers based on a calculated metric of a specific field of events. These metrics include the length of a field, its entropy, and more. Example use case: tag all events that represent Windows processes that were launched using a high number of base64 encoded parameters in order to detect obfuscated fileless malware.
-
terms models: the terms model looks for outliers by calculting rare combinations of a certain field(s) in combination with other field(s). Example use case: tag all events that represent Windows network processes that are rarely observed across all reporting endpoints in order to detect C2 phone home activity.
-
word2vec models (BETA): the word2vec model is the first Machine Learning model defined in ee-outliers. It allows the analyst to train a model based on a set of features that are expected to appear in the same context. After initial training, the model is then able to spot anomalies in unexected combinations of the trained features. Example use case: train a model to learn which usernames, workstations and user roles are expected to appear together in order to alert on breached Windows accounts that are used to laterally move in the network.
General model parameters
es_query_filter
Each model starts with an Elasticsearch query which selects which events the model should consider for analysis. The best way of testing if the query is valid is by copy-pasting it from a working Kibana query.
trigger_on
possible values: low
, high
.
This parameter defines if the outliers model should trigger whenever the calculated model value of the event is lower or higher than the decision boundary. For example, a model that should trigger on users that log into a statistically high number of workstations should trigger on high
values, whereas a model that should detect processes that rarely communicate on the network should trigger on low
values.
trigger_method and trigger_sensitivity
Possible trigger_method
values:
percentile
: percentile.trigger_sensitivity
ranges from0-100
.pct_of_max_value
: percentage of maximum value.trigger_sensitivity
ranges from0-100
.pct_of_median_value
: percentage of median value.trigger_sensitivity
ranges from0-100
.pct_of_avg_value
: percentage of average value.trigger_sensitivity
ranges from0-100
.mad
: Median Average Deviation.trigger_sensitivity
defines the total number of deviations and ranges from0-Inf.
.madpos
: same asmad
but the trigger value will always be positive. In case mad is negative, it will result 0.stdev
: Standard Deviation.trigger_sensitivity
defines the total number of deviations and ranges from0-Inf.
.float
: fixed value to trigger on.trigger_sensitivity
defines the trigger value.
outlier_type
Freetext field which will be added to the outlier event as new field named outliers.outlier_type
.
For example: encoded commands
outlier_reason
Freetext field which will be added to the outlier event as new field named outliers.reason
.
For example: base64 encoded command line arguments
outlier_summary
Freetext field which will be added to the outlier event as new field named outliers.summary
.
For example: base64 encoded command line arguments for process {OsqueryFilter.name}
run_model
Switch to enable / disable running of the model
test_model
Switch to enable / disable testing of the model
should_notify
Switch to enable / disable notifications for the model
simplequery models
This model will simply run an Elasticsearch query and tag all the matching events as outliers. No additional statistical analysis is done. Example use case: tag all the events that represent hidden powershell processes as outliers.
Each metrics model section in the configuration file should be prefixed by simplequery_
.
Example model
##############################
# SIMPLEQUERY - POWERSHELL EXECUTION IN HIDDEN WINDOW
##############################
[simplequery_powershell_execution_hidden_window]
es_query_filter=tags:endpoint AND "powershell.exe" AND (OsqueryFilter.cmdline:"-W hidden" OR OsqueryFilter.cmdline:"-WindowStyle Hidden")
outlier_type=powershell
outlier_reason=powershell execution in hidden window
outlier_summary=powershell execution in hidden window
run_model=1
test_model=0
Parameters
All required options are visible in the example, and are required.
metrics models
The metrics model looks for outliers based on a calculated metric of a specific field of events. These metrics include the length of a field, its entropy, and more. Example use case: tag all events that represent Windows processes that were launched using a high number of base64 encoded parameters in order to detect obfuscated fileless malware.
Each metrics model section in the configuration file should be prefixed by metrics_
.
Example model
##############################
# METRICS - BASE64 ENCODED COMMAND LINE ARGUMENTS
##############################
[metrics_cmdline_containing_url]
es_query_filter=tags:endpoint AND _exists_:OsqueryFilter.cmdline
aggregator=OsqueryFilter.name
target=OsqueryFilter.cmdline
metric=url_length
trigger_on=high
trigger_method=mad
trigger_sensitivity=3
outlier_reason=cmd line args containing URL
outlier_summary=cmd line args contains URL for process {OsqueryFilter.name}
outlier_type=command execution,command & control
run_model=1
test_model=0
should_notify=0
How it works
The metrics model looks for outliers based on a calculated metric of a specific field of events. These metrics include the following:
numerical_value
: use the numerical value of the target field as metric. Example: numerical_value("2") => 2length
: use the target field length as metric. Example: length("outliers") => 8entropy
: use the entropy of the field as metric. Example: entropy("houston") => 2.5216406363433186hex_encoded_length
: calculate total length of hexadecimal encoded substrings in the target and use this as metric.base64_encoded_length
: calculate total length of base64 encoded substrings in the target and use this as metric. Example: base64_encoded_length("houston we have a cHJvYmxlbQ==") => base64_decoded_string: problem, base64_encoded_length: 7url_length
: extract all URLs from the target value and use this as metric. Example: url_length("why don't we go http://www.dance.com") => extracted_urls_length: 20, extracted_urls: http://www.dance.com
The metrics model works as following:
-
The model starts by taking into account all the events defined in the
es_query_filter
parameter. This should be a valid Elasticsearch query. The best way of testing if the query is valid is by copy-pasting it from a working Kibana query. -
The model then calculates the selected metric (
url_length
in the example) for each encountered value of thetarget
field (OsqueryFilter.cmdline
in the example). These values are the checked for outliers in buckets defined by the values of theaggregator
field (OsqueryFilter.name
in the example). Sensitivity for deciding if an event is an outlier is done based on thetrigger_method
(MAD or Mean Average Deviation in this case) and thetrigger_sensitivity
(in this case 3 standard deviations). -
Outlier events are tagged with a range of new fields, all prefixed with
outliers.<outlier_field_name>
.
terms models
The terms model looks for outliers by calculting rare combinations of a certain field(s) in combination with other field(s). Example use case: tag all events that represent Windows network processes that are rarely observed across all reporting endpoints in order to detect C2 phone home activity.
Each metrics model section in the configuration file should be prefixed by terms_
.
Example model
##############################
# TERMS - RARE PROCESSES WITH OUTBOUND CONNECTIVITY
##############################
[terms_rarely_seen_outbound_connections]
es_query_filter=tags:endpoint AND meta.command.name:"get_outbound_conns" AND -OsqueryFilter.remote_port.raw:0 AND -OsqueryFilter.remote_address.raw:127.0.0.1 AND -OsqueryFilter.remote_address.raw:"::1"
aggregator=OsqueryFilter.name
target=meta.hostname
target_count_method=across_aggregators
trigger_on=low
trigger_method=pct_of_max_value
trigger_sensitivity=5
outlier_type=outbound connection
outlier_reason=rare outbound connection
outlier_summary=rare outbound connection: {OsqueryFilter.name}
run_model=1
test_model=0
How it works
The terms model looks for outliers by calculting rare combinations of a certain field(s) in combination with other field(s).It works as following:
-
The model starts by taking into account all the events defined in the
es_query_filter
parameter. This should be a valid Elasticsearch query. The best way of testing if the query is valid is by copy-pasting it from a working Kibana query. -
The model will then count all unique instances of the
target
field, for each individualaggregator
field. In the example above, theOsqueryFilter.name
field represents the process name. The target fieldmeta.hostname
represents the total number of hosts that are observed for that specific aggregator (meaning: how many hosts are observed to be running that process name which is communicating with the outside world?). Events where the communicating process is observed on less than 5 percent of all the observed hosts that contain communicating processes will be flagged as being an outlier. -
Outlier events are tagged with a range of new fields, all prefixed with
outliers.<outlier_field_name>
.
The target_count_method
parameter can be used to define if the analysis should be performed across all values of the aggregator at the same time, or for each value of the aggregator separately.
Whitelisting
ee-outliers provides support for whitelisting of certain outliers. By whitelisting an outlier, you prevent them from being tagged and stored in Elasticsearch.
For events that have already been enriched and that match a whitelist later, the es_wipe_all_whitelisted_outliers
flag can be used in order to remove them.
The whitelist will then be checked for hits periodically as part of the housekeeping work, as defined in the parameter housekeeping_interval_seconds
.
Two different whitelists are defined in the configuration file:
Literals whitelist
This whitelist will only hit for outlier events that contain an exact whitelisted string as one of its event field values. The whitelist is checked against all the event fields, not only the outlier fields!
Example:
[whitelist_literals]
slack_connection=rare outbound connection: Slack.exe
Regular expression whitelist
This whitelist will hit for all outlier events that contain a regular expression match against one of its event field values. The whitelist is checked against all the event fields, not only the outlier fields!
Example:
[whitelist_regexps]
scheduled_task_user_specific_2=^.*rare scheduled task:.*-.*-.*-.*-.*$
autorun_user_specific=^.*rare autorun:.*-.*-.*-.*-.*$
Developing and debugging ee-outliers
ee-outliers supports additional run modes (in addition to interactive and daemon) that can be useful for developing and debugging purposes.
Running in test mode
In this mode, ee-outliers will run all unit tests and finish, providing feedback on the test results.
Running outliers in tests mode:
# Build the image
docker build -t "outliers-dev" .
# Run the image
docker run -v "$PWD/defaults:/mappedvolumes/config" -i outliers-dev:latest python3 outliers.py tests --config /mappedvolumes/config/outliers.conf
Running in profiler mode
In this mode, ee-outliers will run a performance profile (py-spy
) in order to give feedback to which methods are using more or less CPU and memory. This is especially useful for developers of ee-outliers.
Running outliers in profiler mode:
# Build the image
docker build -t "outliers-dev" .
# Run the image
docker run --cap-add SYS_PTRACE -t --network=sensor_network -v "$PWD/defaults:/mappedvolumes/config" -i outliers-dev:latest py-spy -- python3 outliers.py interactive --config /mappedvolumes/config/outliers.conf
Checking code style and PEP8 compliance
In this mode, flake8 is used to check potential issues with style and PEP8.
Running outliers in this mode:
# Build the image
docker build -t "outliers-dev" .
# Run the image
docker run -v "$PWD/defaults:/mappedvolumes/config" -i outliers-dev:latest flake8 /app
You can also provide additional arguments to flake8, for example to ignore certain checks (such as the one around long lines):
docker run -v "$PWD/defaults:/mappedvolumes/config" -i outliers-dev:latest flake8 /app "--ignore=E501"
Screenshots
Detecting beaconing TLS connections using ee-outliers
Configured use case to detect beaconing TLS connections
Detected outlier events are enriched with new fields in Elasticsearch
License
See the LICENSE file for details
Contact
You can reach out to the developers of ee-outliers by creating an issue in github. For any other communication, you can reach out by sending us an e-mail at [email protected].
Thank you for using ee-outliers and we look forward to your feedback!