UnderstandTech Platform Logging Guide
Overview
The UnderstandTech platform deployed on NVIDIA DGX systems generates logs from multiple containerized services. This guide teaches you how to manage logs for the UnderstandTech platform effectively. You'll learn two complementary approaches: viewing real-time logs for immediate troubleshooting (through Portainer's web interface or Docker commands), and automatically archiving logs to disk for long-term analysis and compliance. The archival system we'll set up captures daily snapshots of all container logs, compresses them to save space, and automatically removes old archives after a configurable retention period.
Target Audience: System administrators and DevOps engineers managing UnderstandTech deployments
Prerequisites:
Root or sudo access to the DGX host system
UnderstandTech stack deployed and running
Basic familiarity with Linux command line
Table of Contents
Log Architecture
Real-Time Log Access
Automated Log Archival
Log Analysis and Search
1. Log Architecture
Container Logging Strategy
Docker uses what's called a "logging driver" to determine how these logs are captured and stored. The UnderstandTech platform uses the JSON file logging driver, which is Docker's default. This driver writes logs to JSON-formatted files located in /var/lib/docker/containers/<container-id>/, with one log file per container. Each line in these files contains the log message along with metadata like timestamps and which stream (stdout or stderr) it came from.
The Log Rotation Problem
Without any intervention, Docker would keep appending to these log files forever, eventually filling up your disk. To prevent this, the UnderstandTech stack is configured with automatic log rotation. Look at any service definition in the compose file and you'll see this configuration:
This means Docker will automatically start a new log file when the current one reaches 50MB, keeping only the 5 most recent files. With these settings, each container can accumulate up to 250MB of logs (5 files × 50MB) before Docker starts deleting the oldest file to make room for new logs. This automatic rotation happens entirely within Docker - you don't have to do anything to enable it.
Service Breakdown
The platform consists of multiple services, each generating distinct log streams:
Caddy
ut-caddy
HTTP requests, TLS certificates, reverse proxy events
Low
Frontend
ut-frontend
Healthchecks
Low
API
ut-api
FastAPI requests, authentication, business logic
High
API-Customer
ut-api-customer
REST API requests, integrations
Medium
Workers
understandtech-workers-*
Background job processing, RQ task execution
Medium
Workers-Customer
understand-tech-workers-customer-*
REST API worker-specific background tasks
Medium
LLM
ut-llm
Model inference, GPU operations, Ollama events
High
MongoDB
ut-mongodb
Database operations, queries, authentication
Medium
Redis
ut-redis
Cache operations, queue management
Low
Log Retention Strategy
The UnderstandTech platform uses a two-tier approach to logging. The first tier is Docker's automatic rotation, which gives you immediate access to recent logs through docker logs commands or Portainer's web interface. These are the "active logs" that are always available for troubleshooting issues that are happening right now or recently happened.
The second tier is the automated archival system we'll set up with the ut-logs-archive script. Every day at 2 AM (configurable), this script captures the previous 24 hours of logs from all containers, compresses them with gzip, and stores them in an organized directory structure. These become your "historical logs" - a permanent record that survives container restarts and provides the long retention needed for compliance, trend analysis, or investigating issues that happened days or weeks ago.
Active Logs (Docker):
Location: /var/lib/docker/containers/<container-id>/
Retention: Last 250MB per container (5 files × 50MB)
Access: Real-time via docker logs or Portainer
Archived Logs:
Location: /var/log/understandtech/YYYY-MM/YYYY-MM-DD/
Retention: Configurable (default: 365 days)
Format: Gzip-compressed daily snapshots
Access: Manual extraction and analysis
2. Real-Time Log Access
Using Docker CLI
View recent logs from a specific service:
Using Portainer Web Interface
Portainer provides a (on the dgx host) browser-based log viewer accessible at https://localhost:9443
Steps:
Open Portainer in Firefox
Navigate to Containers in the left sidebar
Click on the container name (e.g., ut-api)
Select Logs from the top menu
Use the interface controls:
Auto-refresh: Enable to stream logs in real-time
Timestamps: Toggle timestamp display
Lines: Adjust number of lines shown (100-2000)
Search: Filter logs with keyword search
Download: Export current view as text file
Portainer provides a (on the dgx host) browser-based log viewer accessible at https://localhost:9443
Quick Diagnostic Commands
Check service health:
Follow all logs with container name prefix:
Search for errors across all services:
3. Automated Log Archival
The ut-logs-archive.sh Script
The ut-logs-archive.sh script provides automated daily log archival with compression and configurable retention policies. The script is designed around a simple concept: once per day, capture the last 24 hours of logs from every UnderstandTech container, compress them, and store them in an organized archive. It then cleans up any archives that have exceeded your retention period.
The script has several command modes. Running it with no arguments performs an immediate archive (useful for testing). The --install command sets up a cron job so the archival happens automatically each night. The --status command shows you statistics about your archive (how much space it's using, how many days are archived, etc.), and --cleanup lets you manually trigger the deletion of old archives if needed.
Installation
Download and make executable:
Configure script variables:
Now let's look at the configuration section at the top of the script. Open it with any text editor/VSCode and you'll see several variables you can customize:
ARCHIVE_DIR
The ARCHIVE_DIR setting determines where your compressed log archives will be saved. The default location (/var/log/understandtech) follows Linux filesystem conventions, which is good for consistency. Make sure this directory is on a filesystem with enough space - for a typical deployment with default settings, expect about 1-2GB per month of archives.
RETENTION_DAYS
The RETENTION_DAYS setting controls how long archives are kept before automatic deletion. The default of 365 days (one year) balances storage costs with the needs of troubleshooting and compliance. If you're in a regulated industry that requires longer retention, you might increase this to 2555 days (seven years) or more. The script will handle the extra storage just fine, as long as you have the disk space.
CONTAINER_PREFIX
The container prefix settings tell the script which containers to archive. By default, it captures anything starting with ut- (all your main services) plus the worker containers (which have a slightly different naming pattern). You shouldn't need to change these unless you've customized your container names.
Create archive directory
Test manual archive:
Expected output:
Install automated daily cron job:
This creates a cron entry that runs daily at 2:00 AM, exporting the previous 24 hours of logs from all ut-* containers.
Script Commands:
./ut-logs-archive.sh
Run manual archive (exports last 24h)
./ut-logs-archive.sh --install
Install daily cron job
./ut-logs-archive.sh --uninstall
Remove cron job
./ut-logs-archive.sh --status
Show archive statistics and cron status
./ut-logs-archive.sh --cleanup
Manually remove archives older than retention period
./ut-logs-archive.sh --help
Display usage information
Archive Directory Structure
When you set up automated archival, the script creates a hierarchical directory structure organized by month and day. Each day's logs live in their own directory, named with the date in YYYY-MM-DD format. This organization makes it incredibly easy to find logs from a specific time period - you just navigate to the right date folder.
Inside each day's directory, you'll find one compressed file per container. The filenames match the container names (like ut-api.log.gz or ut-workers-1.log.gz), so you can quickly identify which service's logs you're looking at. The gzip compression typically reduces log file sizes by 90% or more, meaning your one-year retention policy won't fill up your disk.
Monitoring Archive Health
View archive status:
Output example:
Check cron execution log:
Automatic Cleanup
The script automatically removes archives older than RETENTION_DAYS (default: 365) during each run. Manual cleanup can be triggered by running:
This prompts for confirmation before deletion.
4. Log Analysis and Search
Accessing Archived Logs
View a specific archived log:
Search for specific patterns:
Extract logs for offlline analysis:
Common Search Patterns
Advanced Analysis
Count errors by service:
Timeline of events:
Export to CSV for spreadsheet analysis:
Last updated