> ## Documentation Index
> Fetch the complete documentation index at: https://specterops-bp-2735-release-notes.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration

> Learn about configuring the OpenHound framework.

<img noZoom src="https://mintcdn.com/specterops-bp-2735-release-notes/2djt2Sp9UeFPjBFr/assets/enterprise-AND-community-edition-pill-tag.svg?fit=max&auto=format&n=2djt2Sp9UeFPjBFr&q=85&s=a791748158fde5ff3b3b82b51497ab39" alt="Applies to BloodHound Enterprise and CE" width="482" height="45" data-path="assets/enterprise-AND-community-edition-pill-tag.svg" />

OpenHound uses [DLT configuration management](https://dlthub.com/docs/general-usage/credentials/setup), which lets you set parameters through configuration files, environment variables, or both.

The following sections cover common OpenHound parameters and how to configure them for your deployment.
You configure OpenHound in nearly the same way whether you run it as a containerized service or as a standalone CLI application.

## Configuration management

OpenHound and DLT use a TOML-based configuration layout that organizes settings into sections based on the component or feature. Each top-level section defines defaults for a specific phase during the collection/conversion pipeline.

The syntax allows for nested sections, collector-specific configurations, and collector-specific overrides. For example,  the `[extract]` parallel worker count can be set globally for all collectors, but can also be increased/decreased for a specific collector.

Configuration precedence follows a global-to-specific order: a collector-specific override in `[sources.source.<collector>.extract]` takes priority over a global setting in `[extract]`, which in turn takes priority over the built-in default.

The following sample configuration sets global values for runtime, normalize, and load, then overrides the extract worker count for the Okta and GitHub collectors.

```toml Example ~/.dlt/config.toml theme={null}
[runtime]
log_level = "INFO"
log_rotate_when = "midnight"
log_interval = 1

# HTTP retry/backoff for source requests; rides out API rate limits
# (e.g. GitHub during large collections) instead of failing the run.
request_max_attempts = 15
request_backoff_factor = 1.3
request_max_retry_delay = 900

# Default: logs are set to human readable text
# To switch to structured JSON instead, uncomment the line below (must be uppercase "JSON")
# log_format = "JSON"

[extract]
workers = 4

[sources.source.github.extract]
workers=2

[sources.source.okta.extract]
workers=12

[normalize]
workers = 4

[load]
delete_completed_jobs=true
truncate_staging_dataset=true
workers=2
```

## BloodHound Enterprise configuration parameters

The following parameters must be set in the `[destination.bloodhoundenterprise]` section of the configuration file (or via environment variables) to run OpenHound and schedule data collection for BloodHound Enterprise.

| Destination Option | Environment Variable                              | Description                                                      |
| ------------------ | ------------------------------------------------- | ---------------------------------------------------------------- |
| `token_key`        | DESTINATION\_\_BLOODHOUNDENTERPRISE\_\_TOKEN\_KEY | The API token key for authenticating with BloodHound Enterprise. |
| `token_id`         | DESTINATION\_\_BLOODHOUNDENTERPRISE\_\_TOKEN\_ID  | The API token ID for authenticating with BloodHound Enterprise.  |
| `url`              | DESTINATION\_\_BLOODHOUNDENTERPRISE\_\_URL        | The URL of the BloodHound Enterprise instance.                   |

## Collector-specific configuration parameters

Each collector may have additional required or optional configuration parameters that are specific to the data source being collected. These parameters can also be set in the configuration file or via environment variables.

For more information on collector-specific configuration, visit the configuration documentation page for each collector using the links below.

<CardGroup cols={3}>
  <Card title="Github" href="/openhound/collectors/github/collect-data#configure-openhound">
    View the configuration parameters for the Github collector.
  </Card>

  <Card title="Jamf" href="/openhound/collectors/jamf/collect-data#configure-openhound">
    View the configuration parameters for the Jamf collector.
  </Card>

  <Card title="Okta" href="/openhound/collectors/okta/collect-data#configure-openhound">
    View the configuration parameters for the Okta collector.
  </Card>
</CardGroup>

## Common configuration parameters

The following parameters are common for all OpenHound deployments and collectors.

### Logging modes

OpenHound automatically detects how it's running and selects a logging mode accordingly, with no configuration required:

| Mode        | Detected when                                                                                     | Console output                          | File output |
| ----------- | ------------------------------------------------------------------------------------------------- | --------------------------------------- | ----------- |
| `CLI`       | Running interactively in a terminal (TTY)                                                         | Rich, human-friendly formatted output   | Yes         |
| `CONTAINER` | `LOG_CONTAINER` is set, or `KUBERNETES_SERVICE_HOST` is present (running inside a Kubernetes pod) | Plain text or JSON streamed to `stdout` | Yes         |
| `SERVICE`   | None of the above (for example, running as a background/scheduled process)                        | None                                    | Yes         |

OpenHound checks for container indicators first, then falls back to a TTY check, and defaults to `SERVICE` mode otherwise.

<Note>
  To force `CONTAINER` mode outside of Kubernetes, set the `LOG_CONTAINER` environment variable to a truthy value (for example, `LOG_CONTAINER=true`).

  This is the default behavior in the OpenHound Helm chart and the example Docker Compose files, since `KUBERNETES_SERVICE_HOST` is only present when running inside a Kubernetes pod.
</Note>

### Log format

By default, OpenHound writes logs as human-readable text. Set `log_format` to `JSON` to switch to structured JSON logging, which is useful for ingestion into log aggregation systems.

`JSON` is the only value you can set for this option. To keep the default text logging, omit `log_format` from your configuration or comment it out.

| Runtime Option | Environment Variable   | Description                                                |
| -------------- | ---------------------- | ---------------------------------------------------------- |
| `log_format`   | RUNTIME\_\_LOG\_FORMAT | The log output format. `JSON` is the only supported value. |

<Warning>
  The value must be uppercase `JSON`. DLT's internal logger performs a case-sensitive check for `"JSON"` to decide whether to emit its own structured logs.

  Any other casing (for example, `json` or `Json`) leaves DLT's internal log messages in text format even though OpenHound's own logs switch to JSON, resulting in mixed-format output.
</Warning>

### Log level and rotation

OpenHound implements both time-based and size-based log rotation. When a log is rotated, a timestamp is appended to the filename (for example, `openhound.log.2026-02-19`) and rotated files are compressed using `gzip` to reduce disk usage.

By default, OpenHound maintains two types of log files:

* A global client log (`openhound.log`) that captures logs for the overall OpenHound service
* Collector-specific logs (`ext_collector_name.log`) that capture logs for individual collectors

The following log configuration options are supported by setting the parameters in the `[runtime]` section or via environment
variables:

| Runtime Option     | Environment Variable          | Description                                                                                                                   | Default Value           |
| ------------------ | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ----------------------- |
| `log_level`        | RUNTIME\_\_LOG\_LEVEL         | Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)                                                                             | INFO                    |
| `log_cli_level`    | RUNTIME\_\_LOG\_CLI\_LEVEL    | Console-only verbosity when running in `CLI` mode, independent of `log_level`, which controls file logging.                   | ERROR                   |
| `log_rotate_when`  | RUNTIME\_\_LOG\_ROTATE\_WHEN  | The time-based rotation settings. S for seconds, H for hours, D for days and 'midnight' for rotating at midnight              | midnight                |
| `log_interval`     | RUNTIME\_\_LOG\_INTERVAL      | Rotate every X unit of seconds, hours, days etc. Ignored when rotate\_when is 'midnight'                                      | 1                       |
| `log_max_bytes`    | RUNTIME\_\_LOG\_MAX\_BYTES    | The size-based rotation settings. Rotate the files after an item exceeds the specified byte size. 0 means rotate by time only | 5\_000\_000\_000  (5GB) |
| `log_backup_count` | RUNTIME\_\_LOG\_BACKUP\_COUNT | The amount of files to keep before deleting the oldest.                                                                       | 14                      |

<Note>
  The code default for `log_cli_level` is `ERROR`, which keeps console output nearly silent. We recommend setting it to `WARNING` for day-to-day use so operators see actionable issues on screen without the noise of full `DEBUG`/`INFO` logging.

  Both example configurations shipped with OpenHound set `log_cli_level = "WARNING"`.
</Note>

### HTTP request parameters

OpenHound uses these `[runtime]` parameters to control how it handles and retries failing HTTP requests to source APIs.

| Runtime Option            | Environment Variable                  | Description                                                                                                                                                                                | Default Value |
| ------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------- |
| `http_show_error_body`    | RUNTIME\_\_HTTP\_SHOW\_ERROR\_BODY    | Includes the HTTP response body in raised exceptions/logs. Useful for diagnosing API errors from collectors (GitHub, Okta, Jamf). May expose sensitive response data in logs when enabled. | false         |
| `request_max_attempts`    | RUNTIME\_\_REQUEST\_MAX\_ATTEMPTS     | Maximum number of retry attempts for a failing HTTP request before OpenHound gives up and fails the pipeline.                                                                              | 5             |
| `request_backoff_factor`  | RUNTIME\_\_REQUEST\_BACKOFF\_FACTOR   | Multiplier for the exponential delay between retries. The delay for a given attempt is `backoff_factor * 2^(attempt - 1)` seconds.                                                         | 1             |
| `request_max_retry_delay` | RUNTIME\_\_REQUEST\_MAX\_RETRY\_DELAY | Upper bound, in seconds, on the computed exponential delay. Prevents the wait time from growing unbounded on later attempts.                                                               | 300           |

#### How retries mitigate API rate limits

OpenHound automatically retries requests that fail with `5xx` or `429` status codes, or when the connection is dropped or unreachable. If the API response includes a `Retry-After` header, OpenHound honors that value instead of the computed backoff delay.

The three retry parameters work together as a single exponential-backoff algorithm rather than as independent settings:

* `request_backoff_factor` sets the starting pace of the delay curve. The delay doubles on each subsequent retry attempt (standard exponential backoff).
* `request_max_retry_delay` caps how long any single wait can grow to, so retries don't stall indefinitely between attempts.
* `request_max_attempts` sets how many times OpenHound rides out this curve before it fails the collection run entirely.

The following table compares the default retry curve to the tuned values shipped in OpenHound's example configurations:

| Attempt | Default (`factor=1`, `max=300s`) | Tuned example (`factor=1.3`, `max=900s`) |
| ------- | -------------------------------- | ---------------------------------------- |
| 1       | 1s                               | 1.3s                                     |
| 5       | 16s                              | 20.8s                                    |
| 10      | 300s (capped)                    | 665.6s                                   |
| 15      | — (default stops at attempt 5)   | 900s (capped)                            |

<Note>
  The example values (`request_max_attempts = 15`, `request_backoff_factor = 1.3`, `request_max_retry_delay = 900`) in the shipped `config.toml` templates aren't arbitrary. They were added specifically to work around GitHub API rate-limiting issues encountered during large collections.

  GitHub's secondary/abuse rate limits can take several minutes to clear, and the DLT defaults (5 attempts, 300s cap) aren't enough to ride them out. Raising the attempt count and delay cap lets OpenHound patiently wait out GitHub's rate limits instead of failing the run.
</Note>

```toml Example ~/.dlt/config.toml with tuned retry settings theme={null}
[runtime]
request_max_attempts = 15
request_backoff_factor = 1.3
request_max_retry_delay = 900
```

If you collect from rate-limit-sensitive sources, particularly GitHub, start from these tuned values rather than the DLT defaults.

### Data writing parameters

The data writing parameters specify when and how in-memory data is written to disk during the collection and normalization phase.

The following parameters are configured under the `[data_writer]` section in the configuration file and define the default data writer behavior for the pipeline. Individual pipeline phases, such as the extract and normalize phase, or each individual source can have their own overrides by specifying a different `data_writer` value in the corresponding section.

| Data writer option | Environment Variable               | Description                                                                       | Default Value |
| ------------------ | ---------------------------------- | --------------------------------------------------------------------------------- | ------------- |
| `buffer_max_items` | DATA\_WRITER\_\_BUFFER\_MAX\_ITEMS | The maximum amount of items to keep in memory before writing to disk.             | 5000          |
| `file_max_items`   | DATA\_WRITER\_\_FILE\_MAX\_ITEMS   | The maximum amount of items to write to a single file before creating a new file. | None          |
| `file_max_bytes`   | DATA\_WRITER\_\_FILE\_MAX\_BYTES   | The maximum amount of bytes to write to a single file before creating a new file. | None          |

```toml Example for ~/.dlt/config.toml with data writing overrides theme={null}
[data_writer]
file_max_items=1000

[normalize.data_writer]
file_max_items=100000

[sources.source.okta.data_writer]
file_max_items=50000
```

<Note>
  The `data_writer` parameters directly influence the performance and memory use of the collection/conversion pipeline. Edges and nodes are processed in batches and the amount of processed items is determined by the `data_writer` parameters.

  Setting these parameters too low can result in a large amount of small files and increased overhead with less memory usage, while setting them too high can result in increased memory use and slower performance.

  We recommend experimenting with different values to find the optimal configuration, which typically depends on the size of your environment.
</Note>

### Extract parameters

The extract phase is responsible for collecting data from the data source and generating intermediate (compressed) JSONL files. The extract phase is typically the most time-consuming phase of the pipeline as it involves making API calls to the data source and processing the collected data.

The following parameters are configured under the `[extract]` section in the configuration file.  The extract phase can also have its own [data writer](/openhound/configuration#data-writing-parameters) configuration by setting the `data_writer` parameter in the `[extract]` section, which will override the global data writer settings.

| Extract option | Environment Variable | Description                                                       | Default Value |
| -------------- | -------------------- | ----------------------------------------------------------------- | ------------- |
| `workers`      | EXTRACT\_\_WORKERS   | The amount of concurrent workers used during the collection phase | 5             |

```toml Example for ~/.dlt/config.toml parallel worker overrides theme={null}
[extract]
workers=5

[sources.source.okta.extract]
workers=10
```

### Normalize parameters

The normalize phase is responsible for converting data times and handling schema evolutions. It standardizes column/table names to be snake\_case and is executed automatically between the extract and load phase.

The following parameters are configured under the `[normalize]` section in the configuration file. The normalization phase can also have its own [data writer](/openhound/configuration#data-writing-parameters) configuration by setting the `data_writer` parameter in the `[normalize]` section, which will override the global data writer settings.

| Normalize option | Environment Variable       | Description                                                              | Default Value |
| ---------------- | -------------------------- | ------------------------------------------------------------------------ | ------------- |
| `workers`        | NORMALIZE\_\_WORKERS       | The amount of concurrent workers used during the DLT normalization phase | 1             |
| `start_method`   | NORMALIZE\_\_START\_METHOD | The subprocess starting method (relevant for OS)                         | fork          |

### Load parameters

The load phase is responsible for loading the converted OpenGraph files into the destination, which is either set to local file system or BloodHound Enterprise.

The following parameters are configured under the `[load]` section in the configuration file.

| Load option                | Environment Variable               | Description                                                                                                     | Default Value |
| -------------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------- | ------------- |
| `delete_completed_jobs`    | LOAD\_\_DELETE\_COMPLETED\_JOBS    | Whether to delete completed jobs after a pipeline has completed.                                                | false         |
| `truncate_staging_dataset` | LOAD\_\_TRUNCATE\_STAGING\_DATASET | Whether to truncate the staging dataset after loading data into the destination.                                | false         |
| `workers`                  | LOAD\_\_WORKERS                    | The amount of concurrent workers used during the loading phase, is when uploading data to BloodHound Enterprise | 1             |
