- Rust 94.6%
- Just 1.9%
- Dockerfile 1.5%
- Shell 1.2%
- Nix 0.8%
| .cargo | ||
| .forgejo/workflows | ||
| assets | ||
| src | ||
| .gitignore | ||
| BUILDING.md | ||
| Cargo.toml | ||
| Containerfile | ||
| CONTRIBUTING.md | ||
| flake.nix | ||
| Justfile | ||
| LICENSE | ||
| README.md | ||
| setup-cross-compilation.sh | ||
A high-performance Rust utility for calculating and injecting web page sizes into HTML files with advanced optimization and parallel processing.
Description
webweigh is a command-line tool that scans HTML files in a directory, calculates their total page size (including referenced assets like CSS, JavaScript, and images), and optionally injects this size information into specified HTML elements.
Key capabilities:
- Scan mode: Calculate total website size without modifying files
- Injection mode: Update HTML files with calculated page sizes
- Analysis mode: Detailed breakdown of page size calculations for debugging
- Advanced asset detection: Finds assets referenced in CSS, JS, and HTML
- Smart comment handling: Ignores commented-out code (matches browser behavior)
- Performance optimized: Parallel processing with intelligent caching
This is a Rust rewrite of the Python script originally used by solar.lowtech-website, offering significantly improved performance and new features.
Features
Core Functionality
- Triple Operation Modes: Scan-only, injection, or analysis mode for debugging
- Page Analysis Mode: Detailed breakdown by asset type with size calculations
- Comprehensive Asset Detection: CSS, JavaScript, images, fonts, and dynamic assets
- Smart Comment Stripping: Ignores commented-out code in CSS and JavaScript
- Advanced Asset Parsing: Detects dynamically loaded assets (script.src, fetch calls, etc.)
- Nested Dependency Resolution: Follows asset imports and dependencies
Performance & Optimization
- Parallel Processing: Multi-threaded file processing with Rayon
- Intelligent Caching: File content and metadata caching to eliminate redundant I/O
- Optimized Regex Engine: Pre-filtering with early exit patterns for 3-5x faster parsing
- Memory Efficient: Reusable buffers and zero-allocation utilities
User Experience
- Flexible CSS Selectors: Target any HTML element for size injection
- Human-Readable Output: IEC units (KiB, MiB, GiB) with precise formatting
- Advanced Exclusion System: Regex patterns with contextual exceptions
- Configurable Logging: Silent to trace verbosity levels
- Base URL Handling: Strip URL prefixes for relative path calculation
- Comprehensive Statistics: Detailed processing reports and error context
Installation
Binary releases
Currently, I only build for Linux X86. You can grab the executable from the release page.
Cargo
cargo install --git https://codeberg.org/Pontoporeia/webweigh
From Source
See BUILDING.md for detailed build instructions.
git clone https://codeberg.org/Pontoporeia/webweigh.git
cd webweigh
cargo install --path .
Prerequisites
- Rust 1.70+ (recommended)
- Cargo package manager
Usage
Basic Usage
Scan mode (calculate sizes without modifying files):
webweigh --directory /path/to/website
Injection mode (update HTML files with calculated sizes):
webweigh --directory /path/to/website --selector ".page-size"
Common Examples
Scan a website (read-only analysis):
webweigh --directory ./dist
Update page sizes in elements with class "page-size":
webweigh --directory ./dist --selector ".page-size"
Process with exclusions and exceptions:
webweigh --directory ./dist --selector ".size" \
--exclude "portfolio" "demo" \
--except "portfolio/index.html"
Remove base URL prefix from asset paths:
webweigh --directory ./dist --selector ".size" \
--base-url "https://example.com"
Verbose processing with detailed logs:
webweigh --directory ./dist --selector "#size-info" -vv
Silent operation (no output):
webweigh --directory ./dist --selector "body" --silent
Analyze a specific page (detailed breakdown for debugging):
webweigh --directory ./dist --analyze-page "/"
webweigh --directory ./dist --analyze-page "/portfolio/" --base-url "https://example.com"
This will show:
- Base HTML size
- Assets grouped by type (Stylesheets, Scripts, Images, Fonts, etc.)
- Size and percentage contribution of each asset type
- Detailed list of all assets sorted by size
Perfect for debugging discrepancies between webweigh calculations and browser dev tools.
Excludes & Exceptions
Exclude Examples:
# excludes entire directory
-e static/portfolio
-e static/portfolio templates
# regex: excludes .tmp files in static/
-e "static/.*\.tmp$"
# regex: excludes all .html files in content/microblog/
-e "content/microblog/.*\.html$"
Exceptions Examples:
# excludes all HTML in content/microblog/ except index.html in that directory
-e "content/microblog/.*.html$" --except index.html
# excludes content/temp/ directory except files ending with important.txt
-e content/temp --except "important\.txt$"
Command Line Options
Usage: webweigh [OPTIONS] --directory <DIR>
Options:
-d, --directory <DIR> Directory to traverse (required)
--base-url <URL> Base URL prefix to remove from asset paths
--selector <SELECTOR> CSS selector for size injection (optional - enables scan-only mode if omitted)
--analyze-page <PATH> Analyze specific page with detailed breakdown (e.g., "/", "/portfolio/")
-e, --exclude <PATTERN>... Exclude paths/patterns (supports literal paths and regex)
--except <PATTERN>... Exception patterns within exclusion scope
-v, --verbose... Logging verbosity: -v (info), -vv (debug), -vvv (trace)
-s, --silent Suppress all output
-h, --help Show help information
-V, --version Show version
Verbosity Levels
- Default: Shows only the final statistics report
-v: Shows info-level logs (processing start, configuration)-vv: Shows debug-level logs (detailed processing information)-vvv: Shows trace-level logs (maximum detail)--silent: No output at all
Supported Assets
Direct References:
- HTML files (
.html,.htm) - CSS stylesheets (
<link rel="stylesheet">) - JavaScript files (
<script src>) - Images (
<img src>, CSSurl()) - Fonts (CSS
@font-face,url()) - Icons (
<link rel="icon">)
Dynamic Assets (detected in JS/CSS):
- Dynamic script loading (
script.src = "...") - Dynamic imports (
import(),require()) - Fetch calls (
fetch("/api/data.json")) - CSS imports (
@import url(...)) - Asset assignments (
link.href = "...")
What's New in v0.3.0
Major Features
-
Page Analysis Mode: New
--analyze-pageflag provides detailed breakdown of page size calculations- View assets grouped by type (Stylesheets, Scripts, Images, Fonts, Icons)
- See size and percentage contribution of each asset type
- Complete asset list sorted by size for easy debugging
- Perfect for investigating discrepancies with browser dev tools
-
Smart Comment Stripping: Revolutionary accuracy improvement
- Ignores commented-out CSS (
/* @import "file.css"; */) - Skips commented JavaScript (
// import './module.js';) - Preserves comment-like text in strings and templates
- Matches browser behavior: Only counts assets that are actually loaded
- Significantly more accurate size calculations
- Ignores commented-out CSS (
Testing & Quality
- Comprehensive Test Suite: 41 test cases including 12 new comment stripping tests
- All Tests Passing: Verified behavior for edge cases and real-world scenarios
- Better Accuracy: Calculations now match browser dev tools more closely
What's New in v0.2.0
Major Refactoring
- Modular Architecture: Split monolithic code into focused modules
- Performance Optimizations: 3-5x faster processing with intelligent caching
- Enhanced Asset Detection: Comprehensive dynamic asset discovery
New Features
- Scan-only Mode: Analyze websites without modifying files
- Parallel Processing: Multi-threaded asset collection and processing
- Advanced Caching: File content and metadata caching
- Memory Optimization: Reduced allocations and memory reuse
Improvements
- Better Error Handling: More descriptive error messages
- Enhanced Exclusions: Improved regex vs literal pattern detection
- Zero Warnings: Clean compilation with no warnings
- Comprehensive Tests: 26 test cases covering all functionality
Dependencies
Core runtime dependencies:
- clap: Command-line argument parsing
- walkdir: Recursive directory traversal
- scraper: HTML parsing and CSS selectors
- regex: Pattern matching for asset discovery
- rayon: Parallel processing
- log + env_logger: Configurable logging
- anyhow + thiserror: Error handling
- once_cell: Static initialization
- percent-encoding: URL decoding
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to webweigh.
For build instructions and development setup, see BUILDING.md.
Credits
Icon: Phosphor Icons
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Copyright (C) 2025 Pontoporeia
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.