Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 35 additions & 36 deletions server/scripts/update-workflow-agents.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,46 +5,45 @@ import { AgentCreationSource } from "@/db/schema"
import { countWorkflowAgents } from "./count-workflow-agents"

export const updateWorkflowAgents = async () => {
console.log("🔄 Updating workflow agents from DIRECT to WORKFLOW...")
console.log("🔄 Updating workflow agents from DIRECT to WORKFLOW...")

const existingCount = await countWorkflowAgents()
if (existingCount === 0) {
console.log("✅ No workflow agents to update.")
return
}

const result = await db
.update(agents)
.set({
creation_source: AgentCreationSource.WORKFLOW,
updatedAt: new Date()
})
.where(
and(
eq(agents.creation_source, AgentCreationSource.DIRECT),
eq(agents.isPublic, false),
eq(agents.appIntegrations, sql`'[]'::jsonb`),
eq(agents.allowWebSearch, false),
eq(agents.isRagOn, false),
eq(agents.docIds, sql`'[]'::jsonb`),
isNull(agents.deletedAt)
)
)

console.log(`✅ Updated agents ${existingCount} from DIRECT to WORKFLOW`)
return result
const existingCount = await countWorkflowAgents()
if (existingCount === 0) {
console.log("✅ No workflow agents to update.")
return
}

const result = await db
.update(agents)
.set({
creation_source: AgentCreationSource.WORKFLOW,
updatedAt: new Date(),
})
.where(
and(
eq(agents.creation_source, AgentCreationSource.DIRECT),
eq(agents.isPublic, false),
eq(agents.appIntegrations, sql`'[]'::jsonb`),
eq(agents.allowWebSearch, false),
eq(agents.isRagOn, false),
eq(agents.docIds, sql`'[]'::jsonb`),
isNull(agents.deletedAt),
),
)

console.log(`✅ Updated agents ${existingCount} from DIRECT to WORKFLOW`)
return result
}

// Run if this file is executed directly
if (require.main === module) {
updateWorkflowAgents()
.then(() => {
console.log("🎉 successfully updated workflow agents")
process.exit(0)
})
.catch((error) => {
console.error("💥 Script failed:", error)
process.exit(1)
})
updateWorkflowAgents()
.then(() => {
console.log("🎉 successfully updated workflow agents")
process.exit(0)
})
.catch((error) => {
console.error("💥 Script failed:", error)
process.exit(1)
})
}
40 changes: 40 additions & 0 deletions server/scripts/vespaDataReceive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash
set -e
set -o pipefail

# ------------------------------------------------------------
# STEP 7: Retrieve and Decrypt Vespa Dump
# ------------------------------------------------------------

# ---------- Option 1 — using AWS S3 ----------
# ⚠️ Replace with your actual bucket name and path
aws s3 cp s3://your-bucket-name/dumps/dump.json.gz.enc .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding values like the S3 bucket name and file paths makes the script less reusable and harder to maintain. It's a best practice to define these as variables at the top of the script. This allows for easier configuration across different environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace placeholder bucket name.

The S3 bucket name your-bucket-name is a placeholder that must be replaced with the actual bucket name before running the script.

Consider using environment variables for configuration:

-aws s3 cp s3://your-bucket-name/dumps/dump.json.gz.enc .
+# Read bucket name from environment variable
+BUCKET_NAME="${VESPA_BACKUP_BUCKET:?Error: VESPA_BACKUP_BUCKET environment variable not set}"
+aws s3 cp "s3://${BUCKET_NAME}/dumps/dump.json.gz.enc" .
🤖 Prompt for AI Agents
In server/scripts/vespaDataReceive.sh around line 11, the aws s3 cp command uses
the placeholder bucket name "your-bucket-name"; replace this with the real S3
bucket or, better, read the bucket name from an environment variable (e.g.,
S3_BUCKET) and use that variable in the command, adding a guard to fail with a
clear message if the env var is not set.


# Decrypt AES-256 encrypted dump (you’ll be prompted for password)
openssl enc -d -aes-256-cbc -pbkdf2 -salt \
-in dump.json.gz.enc \
-out dump.json.gz


# ---------- Option 2 — using GPG ----------
# Uncomment these lines if you used GPG encryption instead of OpenSSL

# yum install -y pinentry || apt install -y pinentry
# gpgconf --kill gpg-agent
# export GPG_TTY=$(tty)
# echo $GPG_TTY
# gpg --import my-private-key.asc
# gpg --list-secret-keys
# gpg --output dump.json.gz --decrypt dump.json.gz.gpg


# ------------------------------------------------------------
# STEP 8: Decompress and Feed into Vespa
# ------------------------------------------------------------
gunzip dump.json.gz
vespa-feed-client dump.json

# ------------------------------------------------------------
# Done 🎉
# ------------------------------------------------------------
echo "✅ Vespa data restored successfully!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script leaves behind the downloaded and decompressed dump files (dump.json.gz.enc and dump.json). These files can be very large. It's a good practice to clean them up to conserve disk space. You could add a cleanup step at the end of the script, or use a trap to ensure cleanup happens even on failure.

Suggested change
echo "✅ Vespa data restored successfully!"
echo "✅ Vespa data restored successfully!"
echo "🧹 Cleaning up temporary files..."
rm -f dump.json.gz.enc dump.json

66 changes: 66 additions & 0 deletions server/scripts/vespaDataSend.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/bin/bash
set -e
set -o pipefail

# ------------------------------------------------------------
# STEP 0: AWS Configuration (Non-interactive)
# ------------------------------------------------------------
# ⚠️ Replace with your real credentials
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Comment on lines +9 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Remove hardcoded AWS credentials.

Hardcoded AWS credentials in version control is a critical security vulnerability, even if these appear to be example values. Credentials should never be committed to the repository.

Remove the hardcoded credentials and use environment variables or AWS credential profiles instead:

-# ⚠️ Replace with your real credentials
-export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
-export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
-export AWS_DEFAULT_REGION="ap-south-1"
-export AWS_DEFAULT_OUTPUT="json"
+# Use AWS credentials from environment or AWS config
+# Ensure AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set
+if [ -z "$AWS_ACCESS_KEY_ID" ] || [ -z "$AWS_SECRET_ACCESS_KEY" ]; then
+  echo "Error: AWS credentials not configured. Please set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY"
+  echo "Or configure AWS CLI with: aws configure"
+  exit 1
+fi
+
+export AWS_DEFAULT_REGION="${AWS_DEFAULT_REGION:-ap-south-1}"
+export AWS_DEFAULT_OUTPUT="${AWS_DEFAULT_OUTPUT:-json}"

Additionally, scan the repository for any committed credentials using tools like git-secrets or trufflehog.

🤖 Prompt for AI Agents
In server/scripts/vespaDataSend.sh around lines 9-10 the script contains
hardcoded AWS credentials; remove these two export lines immediately and replace
them with references to external configuration (e.g., expect AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY to be provided via environment variables, AWS CLI
named profiles, or mounted credential files/EC2/ECS/IAM role credentials),
update any documentation or CI pipeline to set those environment variables
securely, and run a repository secrets scan (git-secrets, trufflehog, or
similar) to detect and purge any other committed credentials.

export AWS_DEFAULT_REGION="ap-south-1"
export AWS_DEFAULT_OUTPUT="json"
Comment on lines +8 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Hardcoding AWS credentials in a script, even as examples, is a significant security risk. It encourages a bad practice that can lead to accidentally committing real credentials. The script should rely on the standard AWS CLI credential chain (e.g., IAM roles, environment variables, or the ~/.aws/credentials file). Please remove these export statements.

Suggested change
# ⚠️ Replace with your real credentials
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
export AWS_DEFAULT_REGION="ap-south-1"
export AWS_DEFAULT_OUTPUT="json"
# ⚠️ Ensure your AWS credentials are configured in your environment
# (e.g., via `aws configure` or environment variables).


# AWS performance tuning (optional)
aws configure set default.s3.max_concurrent_requests 20
aws configure set default.s3.multipart_threshold 64MB
aws configure set default.s3.multipart_chunksize 64MB
aws configure set default.s3.max_queue_size 100
aws configure set default.s3.multipart_upload_threshold 64MB
aws configure set default.s3.multipart_max_attempts 5
Comment on lines +14 to +20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modifying the user's global AWS configuration with aws configure set can have unintended side effects on other operations outside of this script. It's safer to use environment variables for these settings to scope them only to the current script execution (e.g., export AWS_MAX_CONCURRENT_REQUESTS=20).

Note that some of these settings, like max_queue_size and multipart_max_attempts, cannot be set via environment variables and must be in the AWS config file. Also, multipart_upload_threshold appears to be a duplicate of multipart_threshold.


aws sts get-caller-identity

# ------------------------------------------------------------
# STEP 1: Start Vespa container (optional if already running)
# ------------------------------------------------------------
# docker run -d --name vespa-testing \
# -e VESPA_IGNORE_NOT_ENOUGH_MEMORY=true \
# -p 8181:8080 \
# -p 19171:19071 \
# -p 2224:22 \
# vespaengine/vespa:latest

# ------------------------------------------------------------
# STEP 2: Export Vespa data
# ------------------------------------------------------------
vespa visit --content-cluster my_content --make-feed > dump.json

# ------------------------------------------------------------
# STEP 3: Compress dump file
# ------------------------------------------------------------
apt install -y pigz || yum install -y pigz
pigz -9 dump.json # creates dump.json.gz

# ------------------------------------------------------------
# STEP 4: Encrypt dump file (AES-256)
# ------------------------------------------------------------
# ⚠️ You’ll be prompted for password — can automate with -pass if needed
openssl enc -aes-256-cbc -pbkdf2 -salt \
-in dump.json.gz \
-out dump.json.gz.enc

# ------------------------------------------------------------
# STEP 5: Upload to AWS S3
# ------------------------------------------------------------
aws s3 cp dump.json.gz.enc s3://your-bucket-name/dumps/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The S3 bucket name is hardcoded. It's better to use a variable defined at the top of the script for better configurability and maintainability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace placeholder S3 bucket name.

The S3 bucket name your-bucket-name is a placeholder that must be replaced. Use an environment variable for configuration.

Apply this diff:

-aws s3 cp dump.json.gz.enc s3://your-bucket-name/dumps/
+BUCKET_NAME="${VESPA_BACKUP_BUCKET:?Error: VESPA_BACKUP_BUCKET not set}"
+TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
+aws s3 cp dump.json.gz.enc "s3://${BUCKET_NAME}/dumps/dump-${TIMESTAMP}.json.gz.enc"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aws s3 cp dump.json.gz.enc s3://your-bucket-name/dumps/
BUCKET_NAME="${VESPA_BACKUP_BUCKET:?Error: VESPA_BACKUP_BUCKET not set}"
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
aws s3 cp dump.json.gz.enc "s3://${BUCKET_NAME}/dumps/dump-${TIMESTAMP}.json.gz.enc"
🤖 Prompt for AI Agents
In server/scripts/vespaDataSend.sh around line 56, the S3 bucket name literal
"your-bucket-name" is a placeholder and should be replaced with a configurable
environment variable; update the aws s3 cp command to use an environment
variable (e.g. "$S3_BUCKET") instead of the hardcoded name, and add a brief
check near the top of the script to ensure S3_BUCKET is set (exit with an error
message if not) so the script fails fast when configuration is missing.


# Optional: show progress bar (Linux only)
# aws s3 cp dump.json.gz.enc s3://your-bucket-name/dumps/ --expected-size $(stat -c%s dump.json.gz.enc)

# ------------------------------------------------------------
# STEP 6: (Optional) Transfer over SSH
# ------------------------------------------------------------
# rsync -avzP --inplace --partial --append -e "ssh -p 2224" dump.json.gz.enc root@192.168.1.6:/home/root/

echo "✅ Vespa dump, compression, encryption, and upload completed successfully!"
157 changes: 157 additions & 0 deletions server/scripts/vespaMigration.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
/* #!/bin/bash

// ------------------------------------------------------------
// STEP 1: Start Vespa container for dump creation
// ------------------------------------------------------------

//docker run -d --name vespa-testing \
//-e VESPA_IGNORE_NOT_ENOUGH_MEMORY=true \
//-p 8181:8080 \
//-p 19171:19071 \
//-p 2224:22 \
//vespaengine/vespa:latest

// ------------------------------------------------------------
// STEP 2: Export Vespa data
// ------------------------------------------------------------

"vespa visit --content-cluster my_content --make-feed > dump.json"

// ------------------------------------------------------------
// STEP 3: Compress dump file
// ------------------------------------------------------------

"apt install -y pigz"
//# or yum install pigz

// pigz is parallel gzip (much faster)
// pigz -9 (1.15 hr, ~280 GB) or -7 (1 hr, ~320 GB)
"pigz -9 dump.json"
// creates dump.json.gz

// if pigz is not available, fallback to gzip
//gzip -9 dump.json
//(gzip -9 -c dump.json > dump.json.gz)

// ------------------------------------------------------------
// STEP 4: Encrypt dump file (OpenSSL password-based)
// ------------------------------------------------------------

"openssl enc -aes-256-cbc -pbkdf2 -salt \
-in dump.json.gz \
-out dump.json.gz.enc"

// Strong AES-256 encryption, password will be prompted
// dump.json.gz.enc → safe to transfer/upload

// ------------------------------------------------------------
// OPTIONAL: GPG-based encryption (if using keypair)
// ------------------------------------------------------------

//gpg --full-generate-key
//gpg --list-keys
//gpg --output dump.json.gz.gpg --encrypt --recipient <A1B2C3D4E5F6G7H8> dump.json.gz

// ------------------------------------------------------------
// STEP 5: Set up SSH access for container-to-host transfer
// ------------------------------------------------------------

// ssh-keygen -t rsa -b 4096 -C "mohd.shoaib@juspay.in"
// cat ~/.ssh/id_rsa.pub

// On your **remote machine (container)**:
//docker exec -it --user root vespa-testing /bin/bash

//apt-get or yum update
//apt-get or yum install -y openssh-client (openssh first)
//ssh-keygen -A
//yum install -y openssh-server
//mkdir -p /var/run/sshd
///usr/sbin/sshd

//mkdir -p ~/.ssh
//chmod 700 ~/.ssh
//echo "PASTE_YOUR_PUBLIC_KEY_HERE" >> ~/.ssh/authorized_keys
//chmod 600 ~/.ssh/authorized_keys

//yum install -y rsync

// ------------------------------------------------------------
// STEP 6: Test SSH + Transfer dump or key
// ------------------------------------------------------------

//ssh -p 2224 root@192.168.1.6 - testing

//yum install -y rsync

//gpg --export-secret-keys --armor BF4AF7E7E3955EF3A436A4ED7C59556BFC58DFAF > my-private-key.asc

//rsync -avzP --inplace --partial --append -e "ssh -p 2224" my-private-key.asc root@192.168.1.6:/home/

"brew install awscli"

"aws configure"

"AWS Access Key ID [None]: ****************"
"AWS Secret Access Key [None]: ********************"
"Default region name [None]: ap-south-1"
"Default output format [None]: json"

//AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
//AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
//Default region name [None]: ap-south-1
//Default output format [None]: json


// For fast file transfer
"aws configure set default.s3.max_concurrent_requests 20"
"aws configure set default.s3.multipart_threshold 64MB"
//Check your identity:
"aws sts get-caller-identity"

// for Making transfers faster (optional)
"aws configure set default.s3.multipart_chunksize 64MB"
"aws configure set default.s3.max_queue_size 100"
"aws configure set default.s3.multipart_upload_threshold 64MB"
"aws configure set default.s3.multipart_max_attempts 5"

"aws s3 cp dump.json.gz.enc s3://your-bucket-name/dumps/"

//Optional (show progress bar):
"aws s3 cp dump.json.gz.enc s3://xyne-vespa-backups/2025-10-13/ --expected-size $(stat -c%s dump.json.gz.enc"

//rsync -avzP --inplace --partial --append -e "ssh -p 2224" dump.json.gz.gpg root@192.168.1.6:/home/root/

// ------------------------------------------------------------
// STEP 7: On the new machine
// ------------------------------------------------------------

// Option 1 — using AWS S3
"aws s3 cp s3://your-bucket-name/dumps/dump.json.gz.enc "

"openssl enc -d -aes-256-cbc -pbkdf2 -salt \
-in dump.json.gz.enc \
-out dump.json.gz"

// Option 2 — if using GPG
//yum install -y pinentry
//gpgconf --kill gpg-agent
//export GPG_TTY=$(tty)
//echo $GPG_TTY

//gpg --import my-private-key.asc
//gpg --list-secret-keys
//gpg --output dump.json.gz --decrypt dump.json.gz.gpg

// ------------------------------------------------------------
// STEP 8: Decompress and feed into Vespa
// ------------------------------------------------------------

"gunzip dump.json.gz"

"vespa-feed-client dump.json"

// ------------------------------------------------------------
// Done 🎉
// ------------------------------------------------------------
*/
Comment on lines +1 to +157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This file appears to be a collection of notes and scratchpad commands, and it's entirely commented out. It also contains example AWS credentials (lines 100-101), which is a security risk even when commented, as they can be flagged by security scanners and promote bad practices. It's best to remove this file from the pull request. If these are important notes, they should be moved to a more appropriate place like a README or a wiki, with any sensitive examples removed.

Comment on lines +1 to +157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Convert to appropriate file format for documentation.

This file is a .js file but contains shell script documentation wrapped in multi-line comments. The mixed syntax (JavaScript comments, shell comments, quoted strings) makes it confusing and non-executable.

Consider one of these options:

Option 1 (Recommended): Convert to Markdown

Rename to vespaMigration.md and format as proper documentation:

# Vespa Data Migration Guide

## Step 1: Start Vespa Container
\`\`\`bash
docker run -d --name vespa-testing \
  -e VESPA_IGNORE_NOT_ENOUGH_MEMORY=true \
  -p 8181:8080 \
  ...
\`\`\`

## Step 2: Export Vespa Data
\`\`\`bash
vespa visit --content-cluster my_content --make-feed > dump.json
\`\`\`
...

Option 2: Convert to executable shell script

Rename to vespaMigration.sh and format as a proper bash script with functions or clear sections.

🤖 Prompt for AI Agents
In server/scripts/vespaMigration.js lines 1-157: the file contains shell
commands wrapped in JavaScript block comments and quoted strings, making it
non-executable and confusing; convert this to a proper documentation or script
file — either rename and reformat as Markdown (vespaMigration.md) with fenced
bash code blocks and headings for each step, or rename to a shell script
(vespaMigration.sh) and remove JS comment markers/quotes, add a shebang, make
commands valid bash (uncomment, group into functions or sections), and ensure
executable permissions; pick one option and update filenames, references, and
git commit accordingly.

Binary file added server/vespa/app.zip
Binary file not shown.
Loading