In my previous article I described how I reorganized several terabytes of personal data and bundled large collections of photos into RAR archives.
This approach dramatically reduced the number of small files in my dataset and made backups much faster. However, once you start relying on archives for long-term storage, a new question naturally arises:
How do you regularly verify that all archive files are still intact? π€
Hard drives age, SSDs can fail, and silent corruption (bitrot) is always a theoretical possibility when storing data over many years.
Fortunately, the RAR toolchain includes built-in commands to test archives without extracting them. Running a test manually is easy:
rar t archive.rar
But doing this manually for hundreds or thousands of archives is obviously impractical.
So I wrote a small Bash script that recursively scans a storage tree and verifies every RAR archive it finds.
What the Script Does
- Recursively scans a storage directory
- Detects both single-volume and multi-volume RAR archives
- Only tests the first volume of multipart archives
- Performs a fast header scan first
- Runs a full CRC verification
- Logs all results to a logfile
- Writes failing archives into a separate error list
- Runs tests in parallel threads for speed
- Displays a progress indicator and ETA
This makes it practical to periodically verify a large archive dataset without manually touching every file.
The Script
Below is the Bash script I currently use to verify my RAR archives.
#!/usr/bin/env bash
# ==========================================
# Configuration
# ==========================================
ERRORFILE="fehlerhafte_rars.txt"
LOGFILE="rar_test.log"
LISTFILE="$(mktemp)"
# Number of parallel jobs (adjust depending on storage speed)
THREADS=${THREADS:-4}
: > "$ERRORFILE"
: > "$LOGFILE"
echo "Searching for RAR archives..."
# ==========================================
# Build list of archives to test
# Only first volumes are included
# ==========================================
find /media/raphael/MASTER/daten/ -type f \( -iname "*.rar" -o -iname "*.r[0-9][0-9]" \) | while read -r f
do
base=$(basename "$f")
# Detect modern multipart archives (.part01.rar)
if [[ "$base" =~ \.part([0-9]+)\.rar$ ]]; then
part="${BASH_REMATCH[1]}"
# Only keep the first volume
[[ "$part" != "01" && "$part" != "1" ]] && continue
echo "$f"
continue
fi
# Skip old multipart volumes (.r00 .r01 ...)
if [[ "$base" =~ \.r[0-9][0-9]$ ]]; then
continue
fi
# Normal archive or first volume of an old multipart set
echo "$f"
done | sort -u > "$LISTFILE"
TOTAL=$(wc -l < "$LISTFILE") echo "Found archives: $TOTAL" echo "Parallel jobs: $THREADS" echo START=$(date +%s) COUNT=0 # ========================================== # Function: test archive # ========================================== test_archive() { file="$1" # Fast header scan (quickly checks archive structure) if ! unrar lt -idq "$file" >/dev/null 2>&1; then
echo "$(date '+%F %T') | HEADER_ERROR | $file" >> "$LOGFILE"
echo "$(date '+%F %T') | $file" >> "$ERRORFILE"
return
fi
# Full CRC verification
if unrar t -idq "$file" >/dev/null 2>&1; then
echo "$(date '+%F %T') | OK | $file" >> "$LOGFILE"
else
echo "$(date '+%F %T') | CRC_ERROR | $file" >> "$LOGFILE"
echo "$(date '+%F %T') | $file" >> "$ERRORFILE"
fi
}
export -f test_archive
export ERRORFILE LOGFILE
# ==========================================
# Main loop with parallel execution
# ==========================================
while read -r file
do
((COUNT++))
test_archive "$file" &
# Limit number of parallel jobs
if (( $(jobs -r | wc -l) >= THREADS )); then
wait -n
fi
# Progress and ETA calculation
now=$(date +%s)
runtime=$((now-START))
avg=$((runtime/COUNT))
remain=$((TOTAL-COUNT))
eta=$((remain*avg))
printf "\rProgress: %d/%d | ETA %02d:%02d:%02d" \
"$COUNT" "$TOTAL" \
$((eta/3600)) $((eta%3600/60)) $((eta%60))
done < "$LISTFILE"
wait
echo
echo
echo "Finished."
echo "Error list: $ERRORFILE"
echo "Logfile: $LOGFILE"
How It Works
The script performs two verification stages:
- Header Scan β quickly verifies archive structure
- Full CRC Test β verifies the actual file contents
If an archive fails either step, it is written to a separate error file.
Because the script runs multiple verification jobs in parallel, it can test large datasets much faster than a sequential scan.
Why Archive Verification Matters
Long-term storage always carries some risk. Periodic archive verification helps detect problems early before backups become unusable.
Especially for datasets stored for many years β like family photos or personal archives β running a periodic integrity check is simply good practice.
And of courseβ¦ automating such tasks with Bash is half the fun π.
