Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Welcome

cfgcut is a CLI for carving meaningful slices out of large network configuration files. It parses vendor-specific syntax into a common tree so you can match by hierarchy, anonymise sensitive fields, and feed the results into automation tooling.

Installing

Until official releases are published, build from source using Rust 1.90 or newer:

cargo install --path crates/cfgcut

Python users can install the bindings after compiling the Rust core:

cargo build -p pycfgcut --release

Both commands place the binaries and extension module under the standard cargo target directory.

Quick start

  1. Grab a configuration file (the tests/fixtures/ directory ships realistic examples).
  2. Call cfgcut with a match expression consisting of hierarchical regex segments.
cfgcut -m 'interfaces||ge-0/0/0|>>|' tests/fixtures/juniper_junos/sample.conf

The example prints the entire ge-0/0/0 subtree. Every segment is implicitly anchored, so ge-0/0/0 will not accidentally match similarly named interfaces.

Add -a to anonymise usernames, secrets, ASNs, and IPv4 addresses, or -q to run in check-only mode where the exit status signals whether a match was found.

Continue to CLI Usage for the full command reference and matcher behaviour.

CLI Usage

Command reference

cfgcut accepts zero or more -m/--match expressions and a list of files or directories. Directories are expanded using glob semantics, so you can point the tool at an entire configuration dump. When no CLI patterns are supplied, cfgcut looks for an inline match block at the top of each file (see below).

OptionDescription
-m, --match <MATCH>Hierarchical regex segments (anchored). Repeat the flag for multiple patterns; takes precedence over inline blocks.
-c, --with-commentsInclude comment lines recognised by the active dialect.
-q, --quietSuppress stdout; rely on exit status to detect matches.
-a, --anonymizeScramble usernames, secrets, ASNs, and IPv4 addresses deterministically.
--tokensEmit newline-delimited JSON token records for every match.
--tokens-out <PATH>Write token records to a file instead of stdout.
--helpDisplay the full usage text with examples.

Combine flags as needed. For example, run a check that exits with status 0 only when a BGP neighbour exists:

cfgcut -q -m 'protocols||bgp||group CUSTOMERS||neighbor 198\.51\.100\.10' router.conf

Match semantics

Configurations are parsed into a hierarchy. Use || to move down levels and place |>>| after a segment to include the entire subtree underneath that node.

  • Every segment is wrapped with ^...$ automatically. ge-.* targets individual interfaces rather than matching a partial line.
  • Matches print their ancestor context so output remains valid configuration. Without |>>|, only the matched line plus its parents are shown.
  • Comment markers are normalised per dialect (for example ! on IOS, # on Junos). Opt into printing them with -c/--with-comments.

Example: fetch every trunk interface from a Cisco IOS device while keeping parent context.

cfgcut -m 'interface .*||switchport trunk allowed vlan .*' tests/fixtures/cisco_ios/sample.conf

To grab an entire Junos subtree:

cfgcut -m 'interfaces||ae1|>>|' tests/fixtures/juniper_junos/sample.conf

Inline match blocks

Fixtures can carry their own match list by starting with a comment that follows this pattern:

{# [
'hostname .*',
"interfaces|>>|",
] #}

Whitespace is ignored and you can mix single or double quotes. The block must appear before any configuration lines; cfgcut strips it before parsing so the comment never shows up in the output. If you also pass one or more -m/--match flags, the CLI values win and the tool emits a warning on stderr to highlight that the inline list was skipped.

Anonymisation and token output

Enabling -a/--anonymize replaces sensitive fields with stable placeholders that remain consistent within a single run. The original values are still available through the token stream produced by --tokens or --tokens-out.

Token payloads include the dialect, hierarchical path, kind, original value, anonymised value (when available), and source line. See Token Extraction Design Notes for the data model and ongoing work.

CI Integration Notes

Required steps

  • Run mise run check on every pull request to cover cargo fmt, cargo clippy, cargo nextest run (plus doc tests), the dependency audit, and the docs build.
  • Publish coverage artefacts with mise run coverage (requires cargo-llvm-cov).
  • Cache the cargo home directory to speed up repeated lint/test runs.
  • Nightly or scheduled job executing mise run fuzz parser -- -runs=1000 and mise run fuzz matcher -- -runs=1000 (after installing a nightly toolchain) to exercise the seed corpora without impacting PR latency.
  • Upload crash artifacts from fuzz runs as CI artifacts for quick triage.
  • Track execution time; fail the fuzz job only on crashes/timeouts to avoid flakiness.
  • Optional benchmarking job (mise run bench) on a dedicated runner to spot large regressions.

Future hooks

  • Once token extraction ships, add integration tests that execute with the new flags and ensure they run as part of the PR suite.
  • When additional dialect fixtures land, extend CI to validate that each dialect parser has coverage (unit + integration).

Example GitHub Actions outline

name: ci

on:
  pull_request:
  push:

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: 1.90
          components: clippy,rustfmt
      - uses: jdx/mise-action@v2
        with:
          version: latest
      - run: cargo install cargo-deny
      - run: cargo install mdbook --locked
      - run: cargo install cargo-llvm-cov
      - run: cargo install cargo-nextest --locked
      - run: mise run check
      - run: mise run coverage

  fuzz-smoke:
    if: github.event_name == 'schedule'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: 1.90
      - uses: jdx/mise-action@v2
        with:
          version: latest
      - run: cargo install cargo-fuzz
      - run: rustup toolchain install nightly --profile minimal
      - run: mise run fuzz parser -- -runs=1000
      - run: mise run fuzz matcher -- -runs=1000

Dialect Contribution Guidelines

This project expects every vendor/platform parser to live under crates/cfgcut/src/dialect/ in its own module. Follow these practices when adding a new dialect:

Parser structure

  • Reuse the shared helpers in dialect::shared whenever possible (comment detection, match text extraction, hierarchy wiring).
  • Keep regexes and parsing rules limited to the minimum needed for correctness; avoid vendor-specific behaviour in the matcher.
  • Collect children and closing nodes the same way existing dialects do so the matcher’s anchoring logic behaves consistently.

Comment and ignored text handling

  • Define comment markers in the dialect module so -c/--with-comments works uniformly. Indentation should not affect comment detection.
  • Exclude device-generated boilerplate (hashes, timestamps) during parsing to avoid noise in matches. Document any ignored text in module comments or tests.

Testing expectations

  • Add snippet-based unit tests covering: comment detection, hierarchy/parent assignment, and closing-brace emission (for brace dialects).
  • Extend crates/cfgcut/tests with integration scenarios using fixtures under tests/fixtures/<vendor_platform>/. New fixtures should be minimal but realistic.
  • Ensure new fixtures are referenced in tests/fixtures/README.md with source attribution.

Anonymizer and token extraction hooks

  • Avoid building anonymization logic into dialect parsers. Use the shared anonymizer so token scrubbing remains consistent.
  • When token extraction lands, surface dialect-specific token types via the shared trait rather than custom plumbing.

Fuzzing & hardening

  • Consider adding a seed corpus covering edge cases for the new dialect under fuzz/corpus/ once fixtures exist.
  • Run mise run check before opening a PR; it enforces formatting, clippy, tests, the dependency audit, and doc build.
  • Capture coverage with mise run coverage when touching parser/matcher logic.

Review checklist

  • cargo fmt, cargo clippy -- -D warnings, cargo nextest run --workspace --all-targets, cargo test --doc, and (if installed) cargo deny check all succeed.
  • Documentation: update README examples if the dialect adds user-visible syntax, and note any new fixtures in tests/fixtures/README.md.

Token Extraction Design Notes

Objectives

  • Reuse the anonymizer's deterministic token maps so anonymized output and extracted tokens stay in sync.
  • Support both CLI and library workflows without changing existing stdout behaviour by default.
  • Provide deterministic ordering of extracted tokens to keep diffs stable.

Planned CLI surface

  • Introduce a --tokens flag that prints newline-delimited JSON or key/value pairs describing matches.
  • Allow --tokens to be combined with --quiet so automation can act on exit status and token payload without extra text.
  • Support writing tokens to a file (--tokens-out <PATH>) in addition to stdout for larger captures.

Data model

  • Describe each token with fields: dialect, path (hierarchical segments), kind (ip, asn, username, secret, literal), and value.
  • Preserve a reference to the anonymized value when anonymization is active.
  • Capture positional metadata (line/column) to help IDE integrations.

Implementation sketch

  • Extend MatchAccumulator to optionally record token spans using the anonymizer maps.
  • Add a TokenSink trait so new dialects can surface dialect-specific token types without coupling to core logic.
  • Ensure comment handling obeys the same switches as existing output (-c/--with-comments).

Testing strategy

  • Add table-driven unit tests per dialect to cover common command patterns (interface addresses, BGP ASNs, login commands).
  • Mirror the anonymizer integration tests with --tokens enabled to confirm consistent mappings.
  • Extend fuzz targets to emit token metadata behind a feature gate to catch malformed spans early.

Open questions

  • Should tokens be grouped per match expression or per line? (Default proposal: per line.)
  • How should overlapping token classes (e.g. passwords containing IP-like strings) be prioritised?
  • Do we need opt-in redaction for custom patterns supplied via config files?