Welcome

cfgcut is a CLI for carving meaningful slices out of large network configuration files. It parses vendor-specific syntax into a common tree so you can match by hierarchy, anonymise sensitive fields, and feed the results into automation tooling.

Installing

Until official releases are published, build from source using Rust 1.90 or newer:

cargo install --path crates/cfgcut

Python users can install the bindings after compiling the Rust core:

cargo build -p pycfgcut --release

Both commands place the binaries and extension module under the standard cargo target directory.

Quick start

Grab a configuration file (the tests/fixtures/ directory ships realistic examples).
Call cfgcut with a match expression consisting of hierarchical regex segments.

cfgcut -m 'interfaces||ge-0/0/0|>>|' tests/fixtures/juniper_junos/sample.conf

The example prints the entire ge-0/0/0 subtree. Every segment is implicitly anchored, so ge-0/0/0 will not accidentally match similarly named interfaces.

Add -a to anonymise usernames, secrets, ASNs, and IPv4 addresses, or -q to run in check-only mode where the exit status signals whether a match was found.

Continue to CLI Usage for the full command reference and matcher behaviour.

CLI Usage

Command reference

cfgcut accepts zero or more -m/--match expressions and a list of files or directories. Directories are expanded using glob semantics, so you can point the tool at an entire configuration dump. When no CLI patterns are supplied, cfgcut looks for an inline match block at the top of each file (see below).

Option	Description
`-m, --match <MATCH>`	Hierarchical regex segments (anchored). Repeat the flag for multiple patterns; takes precedence over inline blocks.
`-c, --with-comments`	Include comment lines recognised by the active dialect.
`--sort-by-path`	Order output by hierarchical path instead of source order (useful for diffing).
`-q, --quiet`	Suppress stdout; rely on exit status to detect matches.
`-a, --anonymize`	Scramble usernames, secrets, ASNs, and IPv4 addresses deterministically.
`--tokens`	Emit newline-delimited JSON token records for every match.
`--tokens-out <PATH>`	Write token records to a file instead of stdout.
`--help`	Display the full usage text with examples.

Combine flags as needed. For example, run a check that exits with status 0 only when a BGP neighbour exists:

cfgcut -q -m 'protocols||bgp||group CUSTOMERS||neighbor 198\.51\.100\.10' router.conf

Match semantics

Configurations are parsed into a hierarchy. Use || to move down levels and place |>>| after a segment to include the entire subtree underneath that node.

Every segment is wrapped with ^...$ automatically. ge-.* targets individual interfaces rather than matching a partial line.
Matches print their ancestor context so output remains valid configuration. Without |>>|, only the matched line plus its parents are shown.
Comment markers are normalised per dialect (for example ! on IOS, # on Junos). Opt into printing them with -c/--with-comments.

Example: fetch every trunk interface from a Cisco IOS device while keeping parent context.

cfgcut -m 'interface .*||switchport trunk allowed vlan .*' tests/fixtures/cisco_ios/sample.conf

To grab an entire Junos subtree:

cfgcut -m 'interfaces||ae1|>>|' tests/fixtures/juniper_junos/sample.conf

To normalize output for diffing:

cfgcut --sort-by-path -m 'interface .*|>>|' tests/fixtures/cisco_ios/sample.conf

Inline match blocks

Fixtures can carry their own match list by starting with a comment that follows this pattern:

{# [
'hostname .*',
"interfaces|>>|",
] #}

Whitespace is ignored and you can mix single or double quotes. The block must appear before any configuration lines; cfgcut strips it before parsing so the comment never shows up in the output. If you also pass one or more -m/--match flags, the CLI values win and the tool emits a warning on stderr to highlight that the inline list was skipped.

Anonymisation and token output

Enabling -a/--anonymize replaces sensitive fields with stable placeholders that remain consistent within a single run. The original values are still available through the token stream produced by --tokens or --tokens-out.

Token payloads include the dialect, hierarchical path, kind, original value, anonymised value (when available), and source line. See Token Extraction Design Notes for the data model and ongoing work.

Diffing Configurations

Extracting only the sections you care about makes it much easier to compare large device configurations. The --sort-by-path flag keeps the rendered output grouped by the hierarchical path of each match, so reordering blocks (a common behavior on Palo Alto and similar platforms) no longer appears as a change.

Normalize The Output

Use the same match expressions for both configurations and include --sort-by-path to stabilize the order. Enable additional switches such as --anonymize or --with-comments as needed for your review.

cfgcut --sort-by-path \
  -m "set network.virtual-router .*|>>|" \
  old.conf

The command above emits the matched configuration with blocks ordered by their hierarchical path instead of their position within the source file.

Compare Two Files

Any external diff tool works once the output is normalized. Classic Unix diff with process substitution is a convenient option on macOS and Linux:

diff -u <(cfgcut --sort-by-path -m "address-group .*|>>|" before.conf) \
        <(cfgcut --sort-by-path -m "address-group .*|>>|" after.conf)

Because both invocations sort by path, identical blocks that only moved between stanzas no longer show up as additions or deletions.

Tips

Add --anonymize when sharing diffs externally so sensitive values are replaced consistently.
When comparing different dialects or vendors, keep the match expression vendor-specific but reuse the --sort-by-path switch.
For GUI diff tools, direct the output to temporary files: cfgcut --sort-by-path … > /tmp/before.txt and … > /tmp/after.txt, then point the tool at those files.

CI Integration Notes

Required steps

Run mise run check on every pull request to cover cargo fmt, cargo clippy, cargo nextest run (plus doc tests), the dependency audit, and the docs build.
Publish coverage artefacts with mise run coverage (requires cargo-llvm-cov).
Cache the cargo home directory to speed up repeated lint/test runs.

Recommended extras

Nightly or scheduled job executing mise run fuzz parser -- -runs=1000 and mise run fuzz matcher -- -runs=1000 (after installing a nightly toolchain) to exercise the seed corpora without impacting PR latency.
Upload crash artifacts from fuzz runs as CI artifacts for quick triage.
Track execution time; fail the fuzz job only on crashes/timeouts to avoid flakiness.
Optional benchmarking job (mise run bench) on a dedicated runner to spot large regressions.

Future hooks

Once token extraction ships, add integration tests that execute with the new flags and ensure they run as part of the PR suite.
When additional dialect fixtures land, extend CI to validate that each dialect parser has coverage (unit + integration).

Example GitHub Actions outline

name: ci

on:
  pull_request:
  push:

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: 1.90
          components: clippy,rustfmt
      - uses: jdx/mise-action@v2
        with:
          version: latest
      - run: cargo install cargo-deny
      - run: cargo install mdbook --locked
      - run: cargo install cargo-llvm-cov
      - run: cargo install cargo-nextest --locked
      - run: mise run check
      - run: mise run coverage

  fuzz-smoke:
    if: github.event_name == 'schedule'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: 1.90
      - uses: jdx/mise-action@v2
        with:
          version: latest
      - run: cargo install cargo-fuzz
      - run: rustup toolchain install nightly --profile minimal
      - run: mise run fuzz parser -- -runs=1000
      - run: mise run fuzz matcher -- -runs=1000

Dialect Contribution Guidelines

This project expects every vendor/platform parser to live under crates/cfgcut/src/dialect/ in its own module. Follow these practices when adding a new dialect:

Parser structure

Reuse the shared helpers in dialect::shared whenever possible (comment detection, match text extraction, hierarchy wiring).
Keep regexes and parsing rules limited to the minimum needed for correctness; avoid vendor-specific behaviour in the matcher.
Collect children and closing nodes the same way existing dialects do so the matcher’s anchoring logic behaves consistently.

Comment and ignored text handling

Define comment markers in the dialect module so -c/--with-comments works uniformly. Indentation should not affect comment detection.
Exclude device-generated boilerplate (hashes, timestamps) during parsing to avoid noise in matches. Document any ignored text in module comments or tests.

Testing expectations

Add snippet-based unit tests covering: comment detection, hierarchy/parent assignment, and closing-brace emission (for brace dialects).
Extend crates/cfgcut/tests with integration scenarios using fixtures under tests/fixtures/<vendor_platform>/. New fixtures should be minimal but realistic.
Ensure new fixtures are referenced in tests/fixtures/README.md with source attribution.

Anonymizer and token extraction hooks

Avoid building anonymization logic into dialect parsers. Use the shared anonymizer so token scrubbing remains consistent.
When token extraction lands, surface dialect-specific token types via the shared trait rather than custom plumbing.

Fuzzing & hardening

Consider adding a seed corpus covering edge cases for the new dialect under fuzz/corpus/ once fixtures exist.
Run mise run check before opening a PR; it enforces formatting, clippy, tests, the dependency audit, and doc build.
Capture coverage with mise run coverage when touching parser/matcher logic.

Review checklist

cargo fmt, cargo clippy -- -D warnings, cargo nextest run --workspace --all-targets, cargo test --doc, and (if installed) cargo deny check all succeed.
Documentation: update README examples if the dialect adds user-visible syntax, and note any new fixtures in tests/fixtures/README.md.

Token Extraction Design Notes

Objectives

Reuse the anonymizer's deterministic token maps so anonymized output and extracted tokens stay in sync.
Support both CLI and library workflows without changing existing stdout behaviour by default.
Provide deterministic ordering of extracted tokens to keep diffs stable.

Planned CLI surface

Introduce a --tokens flag that prints newline-delimited JSON or key/value pairs describing matches.
Allow --tokens to be combined with --quiet so automation can act on exit status and token payload without extra text.
Support writing tokens to a file (--tokens-out <PATH>) in addition to stdout for larger captures.

Data model

Describe each token with fields: dialect, path (hierarchical segments), kind (ip, asn, username, secret, literal), and value.
Preserve a reference to the anonymized value when anonymization is active.
Capture positional metadata (line/column) to help IDE integrations.

Implementation sketch

Extend MatchAccumulator to optionally record token spans using the anonymizer maps.
Add a TokenSink trait so new dialects can surface dialect-specific token types without coupling to core logic.
Ensure comment handling obeys the same switches as existing output (-c/--with-comments).

Testing strategy

Add table-driven unit tests per dialect to cover common command patterns (interface addresses, BGP ASNs, login commands).
Mirror the anonymizer integration tests with --tokens enabled to confirm consistent mappings.
Extend fuzz targets to emit token metadata behind a feature gate to catch malformed spans early.

Open questions

Should tokens be grouped per match expression or per line? (Default proposal: per line.)
How should overlapping token classes (e.g. passwords containing IP-like strings) be prioritised?
Do we need opt-in redaction for custom patterns supplied via config files?

Keyboard shortcuts

cfgcut Documentation