AWS Deployment Guide for Understand Tech

Infrastructure Overview

This section describes what components will be created when deploying Understand Tech in your AWS account. The solution relies on a cloud-native microservices architecture, leveraging managed services for scalability, reliability, and enterprise-grade security. It notably uses AWS Fargate services for application logic, GPU-powered Amazon EC2 instances for AI processing, and integrated storage solutions such as Amazon S3 , Amazon ElastiCache , and Amazon Elastic File System .

Understand Tech Architecture Diagram
Understand Tech Architecture Diagram

The main components of the stack are:

  • Compute layer orchestrated with Amazon ECS Services

    • UT-Frontend: This service delivers the web-based user interface through ECS Fargate, providing the no-code assistant creation interface, analytics dashboards, and administrative controls.

    • UT-API: This service provides the core backend functionality including user authentication, request routing, business logic, and integration with external services.

    • UT-Worker: This service manages asynchronous tasks including document processing, knowledge base updates, model training, and batch operations.

    • UT-MongoDB: This service provides the primary data storage solution, storing user data, conversation histories, assistant configurations, and application metadata with built-in replication and backup capabilities.

    • UT-LLM: This service runs on GPU-powered Amazon EC2 instances specifically optimized for large language model inference. This component handles all AI model processing, including natural language understanding and response generation. The service is deployed with auto-scaling capabilities to handle varying computational demands while maintaining cost efficiency.

  • Storage infrastructure

    • Shared File Systems: Amazon EFS provides scalable, shared storage for AI models, training data, and temporary processing files. EFS enables multiple services to access the same data simultaneously while providing automatic scaling and high availability across multiple availability zones. Understand Tech uses 4 EFS file systems with bursting throughput mode and general purpose performance mode

    • Object Storage : Amazon S3 acts as the primary object storage solution for user-uploaded documents, processed knowledge bases, backup files, and static assets.

    • Cache: Amazon ElastiCache delivers high-performance caching and session management. This component accelerates response times by caching frequently accessed data, user sessions, and intermediate processing results.

  • The Networking Layer is implemented with Amazon CloudFront for the Content Delivery Network (CDN), Application Load Balancer to distribute traffic across the multiple services, Amazon VPC for network isolation, and Security Groups for virtual firewalls.

  • Permissions are managed by AWS Identity and Access Management (IAM), and secrets are stored in AWS Secrets Manager

Cost of the deployed resources is discussed in the Pricing section at the end of this blog.

Prerequisites for the deployment

Please refer to the Github repository to get the list of all prerequisites for the deployment.You must have:

  • Terraform installed (at least version 1.12)

  • AWS credentials to deploy resources. This role role will later be referred to as the `deployment_role_arn`. Here is a policy with the minimum set of permissions to be able to deploy required resources:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:*",
        "cloudfront:*",
        "logs:*",
        "ecs:*",
        "ec2:*",
        "elasticache:*",
        "elasticfilesystem:*",
        "elasticloadbalancing:*",
        "iam:*",
        "s3:*",
        "secretsmanager:*"
      ],
      "Resource": "*"
    }
  ]
}
  • A VPC with public and private subnets across multiple availability zones to host these resources (Have a look at this guide if you need help in creating the VPC)

You will also need GitHub Credentials to retrieve the docker images from the Understand Tech registry. Contact the Understand Tech team if you do not already have the credentials.

Deploy the resources with Terraform

Start by cloning the Github repository and configure your environment:

$ git clone https://github.com/understand-tech/ut-customer-infra-aws.git

Then, edit terraform.tfvars with the required values:

  • aws_account_id : The AWS Account ID used for the deployment

  • aws_region : The AWS region used for the deployment

  • deployment_role_arn : ARN of the IAM role for Terraform deployment

  • admin_role_arn : ARN of the IAM role for administrators. This is the role you will use later to edit secrets in Secrets Manager. Here is a policy with the minimum set of permissions for this role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:PutSecretValue",
        "secretsmanager:UpdateSecret",
        "secretsmanager:DescribeSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:<REGION_ID>:<ACCOUNT_ID>:secret:ut-github-container-registry-credentials*",
        "arn:aws:secretsmanager:<REGION_ID>:<ACCOUNT_ID>:secret:ut-api-secret_manual*",
        "arn:aws:secretsmanager:<REGION_ID>:<ACCOUNT_ID>:secret:ut-mongodb-password*"
      ]
    }
  ]
}

  • vpc_id : VPC ID where resources will be deployed

  • private_subnets_ids : List of the private subnet IDs within the VPC

  • public_subnets_ids : List of the public subnet IDs within the VPC

  • enable_cognito : Whether an Amazon Cognito User pool should be deployed or not as part of the Terraform deployment. This is especially useful when you do not already have your own Identity Provider already set-up. In this blog, we will use a Cognito user pool for authentication so let's consider this variable to be True

Once values are set, you can launch the Understand Tech deployment with the following commands:

$ cd ut-customer-infra-aws/terraform
$ terraform init
$ terraform plan
$ terraform apply

This automatically deploys the AWS resources mentioned in the Infrastructure Overview section.

Post-deployment configuration

There are a few post-deployment configurations you need to go through.Terraform automatically deploy 2 secrets in AWS Secrets Manager that you need to edit:

  • ut-github-container-registry-credentials : Secret containing credentials to pull images from Understand Tech private repository

  • ut-api-secret_manual : Secret that you will need to edit to configure the integration with your Identity Provider and third-party tools

Editing the ut-github-container-registry-credentials secret

Go to the AWS Secrets Manager Console, locate the ut-github-container-registry-credentials secret, and set the username and password values provided by Understand Tech team to pull their private image.

ut-github-container-registry-credentials location in the Secrets Manager console
Setting value for the ut-github-container-registry-credentials secret

Editing the ut-api-secret_manual secret

In the AWS Secrets Manager Console, locate the ut-api-secret_manual secret and set the required values to connect the Understand Tech application to the Cognito identity provider. Understand Tech also supports 3rd party identity providers so please have a look at their documentation if you want to use a different solution than Cognito.

Setting value for ut-api-secret_manual

As indicated in the ReadMe , this is the minimum list of expected key/values in this secret (provide the entire JSON by selecting the Plaintext option when configuring the secret):

{
    "GPU_VM_API_TOKEN":"your_value",
    "OPENID_CLIENT_ID":"your_value",
    "OPENID_CLIENT_SECRET":"your_value",
    "OPENID_SECRET_KEY":"your_value",
    "OPENID_FRONTEND_REDIRECT_URI":"your_value",
    "server_metadata_url":"your_value",
    "token_endpoint":"your_value",
    "jwks_endpoint":"your_value",
    "expected_issuer":"your_value",
    "openid_scope":"openid profile email",
    "OLLAMA_MODEL_LLM":"qwen3:32b-q4_K_M",
    "OLLAMA_MODEL_CONDENSE_LLM":"qwen3:32b-q4_K_M",
    "OLLAMA_HOST_PORT":11434,
    "MODELS_DIR":"/app/models/",
    "EMBED_URL":"https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.f32.gguf",
    "EMBED_MODEL_NAME":"nomic-embed-text-v1.5.f32.gguf",
    "RERANKER_HF_PATH":"BAAI/bge-reranker-v2-m3",
    "RERANKER_MODEL_NAME":"bge-reranker-v2-m3"
}

Here is some help to fill the required fields when deploying the solution with an Amazon Cognito User Pool.

Key
Value

GPU_VM_API_TOKEN

Random string that you can generate with the command openssl rand -hex 32

OPENID_SECRET_KEY

Random string that you can generate with the command openssl rand -hex 32

OPENID_FRONTEND_REDIRECT_URI

https://<YOUR_DOMAIN>/en/login/openid-auth

server_metadata_url

https://cognito-idp.us-east-1.amazonaws.com/<COGNITO_USER_POOL_ID>/.well-known/openid-configuration

token_endpoint

You get it by openning the server_metadata_url in your browser

jwks_endpoint

https://cognito-idp.us-east-1.amazonaws.com/<COGNITO_USER_POOL_ID>/.well-known/jwks.json

expected_issuer

https://cognito-idp.us-east-1.amazonaws.com/<COGNITO_USER_POOL_ID>

Note that you can also specify each model that Understand Tech will use once deployed in your VPC (LLM to answer queries, embedding model, reranker). By default Understand Tech uses Qwen 3 32B (as of release 1.0.3), but you can select your preferred model with the following recommendation from the Understand Tech team:

  • Select advanced reasoning models with "thinking" mode support for multi-step problem solving

  • Assess response accuracy and quality with benchmarks and real-world enterprise use cases

  • Select models with extensive language coverage to make Understand Tech usable in any language

  • Watch for resource efficiency of considered models to optimize performance-to-compute ratios

  • Use industry-specific models is this makes sense for your use case

Restarting the ECS tasksOnce you have set the required value in the two secrets, you need to restart the ECS tasks within each service.Go on the Amazon ECS console, select ut-cluster.From the Services view, select ut-api-service, and force new deployment to restart all tasks within this service.

Restart all ECS Tasks

Repeat this operation for all the other services.This stops all existing ECS tasks, that are automatically restarted by the auto-scaling group, using the latest values from Secrets Manager.

Test the deployment

To test that your application is successfully deployed, go to the CloudFront console and copy the domain name of the Understand Tech distribution you deployed with Terraform.

retrieve cloudfront distribution link

Open this link in your Browser. You should now see the welcome page of Understand Tech. Click on OpenAI Connectto log in with the Identity Provider you configured.

Welcome Page Understand Tech

Once logged in, you will reach the welcome page of Understand Tech.

Main menu Understand Tech

Congratulation, the deployment in your AWS account is over 🥳 You can refer to Understand Tech tutorial to know how to use the platform.

Clean-up your account

You will be charged for the resources you deployed in your AWS account. If you want to remove them, you can simply do it with Terraform running the command

$ terraform destroy

Understand Tech offers a compelling, enterprise-ready platform for deploying AI assistants trained on your own data, either using a fully managed SaaS option or a self-hosted, secure PaaS deployment within your organization’s AWS environment. With its intuitive no-code interface, multi-model flexibility, and robust security controls, Understand Tech bridges the gap between advanced AI capabilities and ease of use—even for non-technical teams.

Whether you’re driven by regulatory needs, data residency concerns, or simply want to keep your company’s intelligence in-house, Understand Tech’s PaaS solution provides the ideal balance of control, security, and agility. With flexible licensing and transparent pricing, it's now easier than ever to transform your business with conversational AI while keeping your infrastructure and data exactly where you want them.

Last updated