A rust CLI that calculates the file size of a web page when loaded with all external ressources.
  • Rust 94.6%
  • Just 1.9%
  • Dockerfile 1.5%
  • Shell 1.2%
  • Nix 0.8%
Find a file
2026-01-17 23:29:29 +01:00
.cargo Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
.forgejo/workflows cross compilation with armv7l support 2025-09-24 22:34:46 +02:00
assets Added icon and icon source mention 2025-08-18 19:32:40 +02:00
src feat: Add page analysis mode and smart comment stripping (v0.3.0) 2025-12-04 22:54:42 +01:00
.gitignore Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
BUILDING.md docs: Comprehensive documentation update for v0.2.0 release 2025-09-19 01:52:19 +02:00
Cargo.toml Updated to latest compatible dependencies versions 2025-12-29 12:46:33 +01:00
Containerfile Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
CONTRIBUTING.md docs: Comprehensive documentation update for v0.2.0 release 2025-09-19 01:52:19 +02:00
flake.nix Added flake.nix 2026-01-17 23:29:29 +01:00
Justfile Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
LICENSE Initial commit 2025-04-16 10:28:33 +00:00
README.md feat: Add page analysis mode and smart comment stripping (v0.3.0) 2025-12-04 22:54:42 +01:00
setup-cross-compilation.sh cross compilation with armv7l support 2025-09-24 22:34:46 +02:00

Logo of a white weighing machine in a green nature colored circle

A high-performance Rust utility for calculating and injecting web page sizes into HTML files with advanced optimization and parallel processing.

Description

webweigh is a command-line tool that scans HTML files in a directory, calculates their total page size (including referenced assets like CSS, JavaScript, and images), and optionally injects this size information into specified HTML elements.

Key capabilities:

  • Scan mode: Calculate total website size without modifying files
  • Injection mode: Update HTML files with calculated page sizes
  • Analysis mode: Detailed breakdown of page size calculations for debugging
  • Advanced asset detection: Finds assets referenced in CSS, JS, and HTML
  • Smart comment handling: Ignores commented-out code (matches browser behavior)
  • Performance optimized: Parallel processing with intelligent caching

This is a Rust rewrite of the Python script originally used by solar.lowtech-website, offering significantly improved performance and new features.

Features

Core Functionality

  • Triple Operation Modes: Scan-only, injection, or analysis mode for debugging
  • Page Analysis Mode: Detailed breakdown by asset type with size calculations
  • Comprehensive Asset Detection: CSS, JavaScript, images, fonts, and dynamic assets
  • Smart Comment Stripping: Ignores commented-out code in CSS and JavaScript
  • Advanced Asset Parsing: Detects dynamically loaded assets (script.src, fetch calls, etc.)
  • Nested Dependency Resolution: Follows asset imports and dependencies

Performance & Optimization

  • Parallel Processing: Multi-threaded file processing with Rayon
  • Intelligent Caching: File content and metadata caching to eliminate redundant I/O
  • Optimized Regex Engine: Pre-filtering with early exit patterns for 3-5x faster parsing
  • Memory Efficient: Reusable buffers and zero-allocation utilities

User Experience

  • Flexible CSS Selectors: Target any HTML element for size injection
  • Human-Readable Output: IEC units (KiB, MiB, GiB) with precise formatting
  • Advanced Exclusion System: Regex patterns with contextual exceptions
  • Configurable Logging: Silent to trace verbosity levels
  • Base URL Handling: Strip URL prefixes for relative path calculation
  • Comprehensive Statistics: Detailed processing reports and error context

Installation

Binary releases

Currently, I only build for Linux X86. You can grab the executable from the release page.

Cargo

cargo install --git https://codeberg.org/Pontoporeia/webweigh

From Source

See BUILDING.md for detailed build instructions.

git clone https://codeberg.org/Pontoporeia/webweigh.git
cd webweigh
cargo install --path .

Prerequisites

  • Rust 1.70+ (recommended)
  • Cargo package manager

Usage

Basic Usage

Scan mode (calculate sizes without modifying files):

webweigh --directory /path/to/website

Injection mode (update HTML files with calculated sizes):

webweigh --directory /path/to/website --selector ".page-size"

Common Examples

Scan a website (read-only analysis):

webweigh --directory ./dist

Update page sizes in elements with class "page-size":

webweigh --directory ./dist --selector ".page-size"

Process with exclusions and exceptions:

webweigh --directory ./dist --selector ".size" \
  --exclude "portfolio" "demo" \
  --except "portfolio/index.html"

Remove base URL prefix from asset paths:

webweigh --directory ./dist --selector ".size" \
  --base-url "https://example.com"

Verbose processing with detailed logs:

webweigh --directory ./dist --selector "#size-info" -vv

Silent operation (no output):

webweigh --directory ./dist --selector "body" --silent

Analyze a specific page (detailed breakdown for debugging):

webweigh --directory ./dist --analyze-page "/"
webweigh --directory ./dist --analyze-page "/portfolio/" --base-url "https://example.com"

This will show:

  • Base HTML size
  • Assets grouped by type (Stylesheets, Scripts, Images, Fonts, etc.)
  • Size and percentage contribution of each asset type
  • Detailed list of all assets sorted by size

Perfect for debugging discrepancies between webweigh calculations and browser dev tools.

Excludes & Exceptions

Exclude Examples:

# excludes entire directory
-e static/portfolio
-e static/portfolio templates

# regex: excludes .tmp files in static/
-e "static/.*\.tmp$" 

# regex: excludes all .html files in content/microblog/
-e "content/microblog/.*\.html$"

Exceptions Examples:

# excludes all HTML in content/microblog/ except index.html in that directory
-e "content/microblog/.*.html$" --except index.html

# excludes content/temp/ directory except files ending with important.txt
  -e content/temp --except "important\.txt$"

Command Line Options

Usage: webweigh [OPTIONS] --directory <DIR>

Options:
  -d, --directory <DIR>          Directory to traverse (required)
      --base-url <URL>           Base URL prefix to remove from asset paths
      --selector <SELECTOR>      CSS selector for size injection (optional - enables scan-only mode if omitted)
      --analyze-page <PATH>      Analyze specific page with detailed breakdown (e.g., "/", "/portfolio/")
  -e, --exclude <PATTERN>...     Exclude paths/patterns (supports literal paths and regex)
      --except <PATTERN>...      Exception patterns within exclusion scope
  -v, --verbose...               Logging verbosity: -v (info), -vv (debug), -vvv (trace)
  -s, --silent                   Suppress all output
  -h, --help                     Show help information
  -V, --version                  Show version

Verbosity Levels

  • Default: Shows only the final statistics report
  • -v: Shows info-level logs (processing start, configuration)
  • -vv: Shows debug-level logs (detailed processing information)
  • -vvv: Shows trace-level logs (maximum detail)
  • --silent: No output at all

Supported Assets

Direct References:

  • HTML files (.html, .htm)
  • CSS stylesheets (<link rel="stylesheet">)
  • JavaScript files (<script src>)
  • Images (<img src>, CSS url())
  • Fonts (CSS @font-face, url())
  • Icons (<link rel="icon">)

Dynamic Assets (detected in JS/CSS):

  • Dynamic script loading (script.src = "...")
  • Dynamic imports (import(), require())
  • Fetch calls (fetch("/api/data.json"))
  • CSS imports (@import url(...))
  • Asset assignments (link.href = "...")

What's New in v0.3.0

Major Features

  • Page Analysis Mode: New --analyze-page flag provides detailed breakdown of page size calculations

    • View assets grouped by type (Stylesheets, Scripts, Images, Fonts, Icons)
    • See size and percentage contribution of each asset type
    • Complete asset list sorted by size for easy debugging
    • Perfect for investigating discrepancies with browser dev tools
  • Smart Comment Stripping: Revolutionary accuracy improvement

    • Ignores commented-out CSS (/* @import "file.css"; */)
    • Skips commented JavaScript (// import './module.js';)
    • Preserves comment-like text in strings and templates
    • Matches browser behavior: Only counts assets that are actually loaded
    • Significantly more accurate size calculations

Testing & Quality

  • Comprehensive Test Suite: 41 test cases including 12 new comment stripping tests
  • All Tests Passing: Verified behavior for edge cases and real-world scenarios
  • Better Accuracy: Calculations now match browser dev tools more closely

What's New in v0.2.0

Major Refactoring

  • Modular Architecture: Split monolithic code into focused modules
  • Performance Optimizations: 3-5x faster processing with intelligent caching
  • Enhanced Asset Detection: Comprehensive dynamic asset discovery

New Features

  • Scan-only Mode: Analyze websites without modifying files
  • Parallel Processing: Multi-threaded asset collection and processing
  • Advanced Caching: File content and metadata caching
  • Memory Optimization: Reduced allocations and memory reuse

Improvements

  • Better Error Handling: More descriptive error messages
  • Enhanced Exclusions: Improved regex vs literal pattern detection
  • Zero Warnings: Clean compilation with no warnings
  • Comprehensive Tests: 26 test cases covering all functionality

Dependencies

Core runtime dependencies:

  • clap: Command-line argument parsing
  • walkdir: Recursive directory traversal
  • scraper: HTML parsing and CSS selectors
  • regex: Pattern matching for asset discovery
  • rayon: Parallel processing
  • log + env_logger: Configurable logging
  • anyhow + thiserror: Error handling
  • once_cell: Static initialization
  • percent-encoding: URL decoding

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to webweigh.

For build instructions and development setup, see BUILDING.md.

Credits

Icon: Phosphor Icons

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

Copyright (C) 2025 Pontoporeia

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.