UnderstandTech Platform Logging Guide

Overview

The UnderstandTech platform deployed on NVIDIA DGX systems generates logs from multiple containerized services. This guide teaches you how to manage logs for the UnderstandTech platform effectively. You'll learn two complementary approaches: viewing real-time logs for immediate troubleshooting (through Portainer's web interface or Docker commands), and automatically archiving logs to disk for long-term analysis and compliance. The archival system we'll set up captures daily snapshots of all container logs, compresses them to save space, and automatically removes old archives after a configurable retention period.

Target Audience: System administrators and DevOps engineers managing UnderstandTech deployments

Prerequisites:

  • Root or sudo access to the DGX host system

  • UnderstandTech stack deployed and running

  • Basic familiarity with Linux command line


Table of Contents

  1. Log Architecture

  2. Real-Time Log Access

  3. Automated Log Archival

  4. Log Analysis and Search


1. Log Architecture

Container Logging Strategy

Docker uses what's called a "logging driver" to determine how these logs are captured and stored. The UnderstandTech platform uses the JSON file logging driver, which is Docker's default. This driver writes logs to JSON-formatted files located in /var/lib/docker/containers/<container-id>/, with one log file per container. Each line in these files contains the log message along with metadata like timestamps and which stream (stdout or stderr) it came from.

The Log Rotation Problem

Without any intervention, Docker would keep appending to these log files forever, eventually filling up your disk. To prevent this, the UnderstandTech stack is configured with automatic log rotation. Look at any service definition in the compose file and you'll see this configuration:

This means Docker will automatically start a new log file when the current one reaches 50MB, keeping only the 5 most recent files. With these settings, each container can accumulate up to 250MB of logs (5 files × 50MB) before Docker starts deleting the oldest file to make room for new logs. This automatic rotation happens entirely within Docker - you don't have to do anything to enable it.

Service Breakdown

The platform consists of multiple services, each generating distinct log streams:

Service
Container Name
Log Focus
Volume

Caddy

ut-caddy

HTTP requests, TLS certificates, reverse proxy events

Low

Frontend

ut-frontend

Healthchecks

Low

API

ut-api

FastAPI requests, authentication, business logic

High

API-Customer

ut-api-customer

REST API requests, integrations

Medium

Workers

understandtech-workers-*

Background job processing, RQ task execution

Medium

Workers-Customer

understand-tech-workers-customer-*

REST API worker-specific background tasks

Medium

LLM

ut-llm

Model inference, GPU operations, Ollama events

High

MongoDB

ut-mongodb

Database operations, queries, authentication

Medium

Redis

ut-redis

Cache operations, queue management

Low

Log Retention Strategy

The UnderstandTech platform uses a two-tier approach to logging. The first tier is Docker's automatic rotation, which gives you immediate access to recent logs through docker logs commands or Portainer's web interface. These are the "active logs" that are always available for troubleshooting issues that are happening right now or recently happened.

The second tier is the automated archival system we'll set up with the ut-logs-archive script. Every day at 2 AM (configurable), this script captures the previous 24 hours of logs from all containers, compresses them with gzip, and stores them in an organized directory structure. These become your "historical logs" - a permanent record that survives container restarts and provides the long retention needed for compliance, trend analysis, or investigating issues that happened days or weeks ago.

Active Logs (Docker):

  • Location: /var/lib/docker/containers/<container-id>/

  • Retention: Last 250MB per container (5 files × 50MB)

  • Access: Real-time via docker logs or Portainer

Archived Logs:

  • Location: /var/log/understandtech/YYYY-MM/YYYY-MM-DD/

  • Retention: Configurable (default: 365 days)

  • Format: Gzip-compressed daily snapshots

  • Access: Manual extraction and analysis


2. Real-Time Log Access

Using Docker CLI

View recent logs from a specific service:

Using Portainer Web Interface

Portainer provides a (on the dgx host) browser-based log viewer accessible at https://localhost:9443arrow-up-right

Steps:

  1. Open Portainer in Firefox

  2. Navigate to Containers in the left sidebar

  3. Click on the container name (e.g., ut-api)

  4. Select Logs from the top menu

  5. Use the interface controls:

    1. Auto-refresh: Enable to stream logs in real-time

    2. Timestamps: Toggle timestamp display

    3. Lines: Adjust number of lines shown (100-2000)

    4. Search: Filter logs with keyword search

    5. Download: Export current view as text file

Portainer provides a (on the dgx host) browser-based log viewer accessible at https://localhost:9443arrow-up-right

Quick Diagnostic Commands

Check service health:

Follow all logs with container name prefix:

Search for errors across all services:


3. Automated Log Archival

The ut-logs-archive.sh Script

The ut-logs-archive.sh script provides automated daily log archival with compression and configurable retention policies. The script is designed around a simple concept: once per day, capture the last 24 hours of logs from every UnderstandTech container, compress them, and store them in an organized archive. It then cleans up any archives that have exceeded your retention period.

The script has several command modes. Running it with no arguments performs an immediate archive (useful for testing). The --install command sets up a cron job so the archival happens automatically each night. The --status command shows you statistics about your archive (how much space it's using, how many days are archived, etc.), and --cleanup lets you manually trigger the deletion of old archives if needed.

Installation

1

Download and make executable:

2

Configure script variables:

Now let's look at the configuration section at the top of the script. Open it with any text editor/VSCode and you'll see several variables you can customize:

chevron-rightARCHIVE_DIRhashtag

The ARCHIVE_DIR setting determines where your compressed log archives will be saved. The default location (/var/log/understandtech) follows Linux filesystem conventions, which is good for consistency. Make sure this directory is on a filesystem with enough space - for a typical deployment with default settings, expect about 1-2GB per month of archives.

chevron-rightRETENTION_DAYShashtag

The RETENTION_DAYS setting controls how long archives are kept before automatic deletion. The default of 365 days (one year) balances storage costs with the needs of troubleshooting and compliance. If you're in a regulated industry that requires longer retention, you might increase this to 2555 days (seven years) or more. The script will handle the extra storage just fine, as long as you have the disk space.

chevron-rightCONTAINER_PREFIXhashtag

The container prefix settings tell the script which containers to archive. By default, it captures anything starting with ut- (all your main services) plus the worker containers (which have a slightly different naming pattern). You shouldn't need to change these unless you've customized your container names.

3

Create archive directory

4

Test manual archive:

Expected output:

5

Install automated daily cron job:

This creates a cron entry that runs daily at 2:00 AM, exporting the previous 24 hours of logs from all ut-* containers.

Script Commands:

Command
Purpose

./ut-logs-archive.sh

Run manual archive (exports last 24h)

./ut-logs-archive.sh --install

Install daily cron job

./ut-logs-archive.sh --uninstall

Remove cron job

./ut-logs-archive.sh --status

Show archive statistics and cron status

./ut-logs-archive.sh --cleanup

Manually remove archives older than retention period

./ut-logs-archive.sh --help

Display usage information

Archive Directory Structure

When you set up automated archival, the script creates a hierarchical directory structure organized by month and day. Each day's logs live in their own directory, named with the date in YYYY-MM-DD format. This organization makes it incredibly easy to find logs from a specific time period - you just navigate to the right date folder.

Inside each day's directory, you'll find one compressed file per container. The filenames match the container names (like ut-api.log.gz or ut-workers-1.log.gz), so you can quickly identify which service's logs you're looking at. The gzip compression typically reduces log file sizes by 90% or more, meaning your one-year retention policy won't fill up your disk.

Monitoring Archive Health

View archive status:

Output example:

Check cron execution log:

Automatic Cleanup

The script automatically removes archives older than RETENTION_DAYS (default: 365) during each run. Manual cleanup can be triggered by running:

This prompts for confirmation before deletion.


Accessing Archived Logs

View a specific archived log:

Search for specific patterns:

Extract logs for offlline analysis:

Common Search Patterns

Advanced Analysis

Count errors by service:

Timeline of events:

Export to CSV for spreadsheet analysis:


Last updated