Welcome
cfgcut is a CLI for carving meaningful slices out of large network configuration files. It parses vendor-specific syntax into a common tree so you can match by hierarchy, anonymise sensitive fields, and feed the results into automation tooling.
Installing
Until official releases are published, build from source using Rust 1.90 or newer:
cargo install --path crates/cfgcut
Python users can install the bindings after compiling the Rust core:
cargo build -p pycfgcut --release
Both commands place the binaries and extension module under the standard cargo target directory.
Quick start
- Grab a configuration file (the
tests/fixtures/directory ships realistic examples). - Call
cfgcutwith a match expression consisting of hierarchical regex segments.
cfgcut -m 'interfaces||ge-0/0/0|>>|' tests/fixtures/juniper_junos/sample.conf
The example prints the entire ge-0/0/0 subtree. Every segment is implicitly anchored, so ge-0/0/0 will not accidentally match similarly named interfaces.
Add -a to anonymise usernames, secrets, ASNs, and IPv4 addresses, or -q to run in check-only mode where the exit status signals whether a match was found.
Continue to CLI Usage for the full command reference and matcher behaviour.
CLI Usage
Command reference
cfgcut accepts zero or more -m/--match expressions and a list of files or directories. Directories are expanded using glob semantics, so you can point the tool at an entire configuration dump. When no CLI patterns are supplied, cfgcut looks for an inline match block at the top of each file (see below).
| Option | Description |
|---|---|
-m, --match <MATCH> | Hierarchical regex segments (anchored). Repeat the flag for multiple patterns; takes precedence over inline blocks. |
-c, --with-comments | Include comment lines recognised by the active dialect. |
-q, --quiet | Suppress stdout; rely on exit status to detect matches. |
-a, --anonymize | Scramble usernames, secrets, ASNs, and IPv4 addresses deterministically. |
--tokens | Emit newline-delimited JSON token records for every match. |
--tokens-out <PATH> | Write token records to a file instead of stdout. |
--help | Display the full usage text with examples. |
Combine flags as needed. For example, run a check that exits with status 0 only when a BGP neighbour exists:
cfgcut -q -m 'protocols||bgp||group CUSTOMERS||neighbor 198\.51\.100\.10' router.conf
Match semantics
Configurations are parsed into a hierarchy. Use || to move down levels and place |>>| after a segment to include the entire subtree underneath that node.
- Every segment is wrapped with
^...$automatically.ge-.*targets individual interfaces rather than matching a partial line. - Matches print their ancestor context so output remains valid configuration. Without
|>>|, only the matched line plus its parents are shown. - Comment markers are normalised per dialect (for example
!on IOS,#on Junos). Opt into printing them with-c/--with-comments.
Example: fetch every trunk interface from a Cisco IOS device while keeping parent context.
cfgcut -m 'interface .*||switchport trunk allowed vlan .*' tests/fixtures/cisco_ios/sample.conf
To grab an entire Junos subtree:
cfgcut -m 'interfaces||ae1|>>|' tests/fixtures/juniper_junos/sample.conf
Inline match blocks
Fixtures can carry their own match list by starting with a comment that follows this pattern:
{# [
'hostname .*',
"interfaces|>>|",
] #}
Whitespace is ignored and you can mix single or double quotes. The block must appear before any configuration lines; cfgcut strips it before parsing so the comment never shows up in the output. If you also pass one or more -m/--match flags, the CLI values win and the tool emits a warning on stderr to highlight that the inline list was skipped.
Anonymisation and token output
Enabling -a/--anonymize replaces sensitive fields with stable placeholders that remain consistent within a single run. The original values are still available through the token stream produced by --tokens or --tokens-out.
Token payloads include the dialect, hierarchical path, kind, original value, anonymised value (when available), and source line. See Token Extraction Design Notes for the data model and ongoing work.
CI Integration Notes
Required steps
- Run
mise run checkon every pull request to covercargo fmt,cargo clippy,cargo nextest run(plus doc tests), the dependency audit, and the docs build. - Publish coverage artefacts with
mise run coverage(requirescargo-llvm-cov). - Cache the cargo home directory to speed up repeated lint/test runs.
Recommended extras
- Nightly or scheduled job executing
mise run fuzz parser -- -runs=1000andmise run fuzz matcher -- -runs=1000(after installing a nightly toolchain) to exercise the seed corpora without impacting PR latency. - Upload crash artifacts from fuzz runs as CI artifacts for quick triage.
- Track execution time; fail the fuzz job only on crashes/timeouts to avoid flakiness.
- Optional benchmarking job (
mise run bench) on a dedicated runner to spot large regressions.
Future hooks
- Once token extraction ships, add integration tests that execute with the new flags and ensure they run as part of the PR suite.
- When additional dialect fixtures land, extend CI to validate that each dialect parser has coverage (unit + integration).
Example GitHub Actions outline
name: ci
on:
pull_request:
push:
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: 1.90
components: clippy,rustfmt
- uses: jdx/mise-action@v2
with:
version: latest
- run: cargo install cargo-deny
- run: cargo install mdbook --locked
- run: cargo install cargo-llvm-cov
- run: cargo install cargo-nextest --locked
- run: mise run check
- run: mise run coverage
fuzz-smoke:
if: github.event_name == 'schedule'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: 1.90
- uses: jdx/mise-action@v2
with:
version: latest
- run: cargo install cargo-fuzz
- run: rustup toolchain install nightly --profile minimal
- run: mise run fuzz parser -- -runs=1000
- run: mise run fuzz matcher -- -runs=1000
Dialect Contribution Guidelines
This project expects every vendor/platform parser to live under crates/cfgcut/src/dialect/ in its own module. Follow these practices when adding a new dialect:
Parser structure
- Reuse the shared helpers in
dialect::sharedwhenever possible (comment detection, match text extraction, hierarchy wiring). - Keep regexes and parsing rules limited to the minimum needed for correctness; avoid vendor-specific behaviour in the matcher.
- Collect children and closing nodes the same way existing dialects do so the matcher’s anchoring logic behaves consistently.
Comment and ignored text handling
- Define comment markers in the dialect module so
-c/--with-commentsworks uniformly. Indentation should not affect comment detection. - Exclude device-generated boilerplate (hashes, timestamps) during parsing to avoid noise in matches. Document any ignored text in module comments or tests.
Testing expectations
- Add snippet-based unit tests covering: comment detection, hierarchy/parent assignment, and closing-brace emission (for brace dialects).
- Extend
crates/cfgcut/testswith integration scenarios using fixtures undertests/fixtures/<vendor_platform>/. New fixtures should be minimal but realistic. - Ensure new fixtures are referenced in
tests/fixtures/README.mdwith source attribution.
Anonymizer and token extraction hooks
- Avoid building anonymization logic into dialect parsers. Use the shared anonymizer so token scrubbing remains consistent.
- When token extraction lands, surface dialect-specific token types via the shared trait rather than custom plumbing.
Fuzzing & hardening
- Consider adding a seed corpus covering edge cases for the new dialect under
fuzz/corpus/once fixtures exist. - Run
mise run checkbefore opening a PR; it enforces formatting, clippy, tests, the dependency audit, and doc build. - Capture coverage with
mise run coveragewhen touching parser/matcher logic.
Review checklist
cargo fmt,cargo clippy -- -D warnings,cargo nextest run --workspace --all-targets,cargo test --doc, and (if installed)cargo deny checkall succeed.- Documentation: update README examples if the dialect adds user-visible syntax, and note any new fixtures in
tests/fixtures/README.md.
Token Extraction Design Notes
Objectives
- Reuse the anonymizer's deterministic token maps so anonymized output and extracted tokens stay in sync.
- Support both CLI and library workflows without changing existing stdout behaviour by default.
- Provide deterministic ordering of extracted tokens to keep diffs stable.
Planned CLI surface
- Introduce a
--tokensflag that prints newline-delimited JSON or key/value pairs describing matches. - Allow
--tokensto be combined with--quietso automation can act on exit status and token payload without extra text. - Support writing tokens to a file (
--tokens-out <PATH>) in addition to stdout for larger captures.
Data model
- Describe each token with fields:
dialect,path(hierarchical segments),kind(ip, asn, username, secret, literal), andvalue. - Preserve a reference to the anonymized value when anonymization is active.
- Capture positional metadata (line/column) to help IDE integrations.
Implementation sketch
- Extend
MatchAccumulatorto optionally record token spans using the anonymizer maps. - Add a
TokenSinktrait so new dialects can surface dialect-specific token types without coupling to core logic. - Ensure comment handling obeys the same switches as existing output (
-c/--with-comments).
Testing strategy
- Add table-driven unit tests per dialect to cover common command patterns (interface addresses, BGP ASNs, login commands).
- Mirror the anonymizer integration tests with
--tokensenabled to confirm consistent mappings. - Extend fuzz targets to emit token metadata behind a feature gate to catch malformed spans early.
Open questions
- Should tokens be grouped per match expression or per line? (Default proposal: per line.)
- How should overlapping token classes (e.g. passwords containing IP-like strings) be prioritised?
- Do we need opt-in redaction for custom patterns supplied via config files?