|
| 1 | +# Reference Table Schema Evolution - Debug Summary |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +We are debugging why `TestAccReferenceTable_SchemaEvolution` fails in Terraform when updating a cloud file table's schema, even though the same operation works when calling the API directly. |
| 6 | + |
| 7 | +### The Problem |
| 8 | + |
| 9 | +- **Terraform test**: Fails when updating schema from 3 fields (a, b, c) to 4 fields (a, b, c, d) for a cloud file table |
| 10 | +- **Direct API test**: Same operation succeeds when done manually |
| 11 | +- **Root cause**: Schema updates for cloud file tables are asynchronous - the file sync must complete before the schema is updated in the backend |
| 12 | + |
| 13 | +## What We've Done |
| 14 | + |
| 15 | +### 1. Identified the Asynchronous Nature |
| 16 | +- Schema updates for cloud files happen asynchronously after file sync completes |
| 17 | +- The API returns 200 OK immediately, but the schema update happens later |
| 18 | +- In manual testing, we confirmed schema updates take ~5-10 seconds after the API call |
| 19 | + |
| 20 | +### 2. Added Retry Logic to Terraform Provider |
| 21 | +**File**: `datadog/fwprovider/resource_datadog_reference_table.go` |
| 22 | + |
| 23 | +- Added retry logic in `Update()` method to wait for schema updates (lines 415-470) |
| 24 | +- Added 3-second initial wait before first check (line 424) |
| 25 | +- Added pre-update check to ensure table is ready (status DONE/ERROR) before updating schema (lines 363-401) |
| 26 | +- Improved error messages to include status and file path for debugging |
| 27 | +- Retry configuration: 10 attempts, 5-second intervals (50 seconds total) |
| 28 | + |
| 29 | +### 3. Updated Test Configuration |
| 30 | +**File**: `datadog/tests/resource_datadog_reference_table_test.go` |
| 31 | + |
| 32 | +- Test creates table with `test.csv` (3 fields: a, b, c) |
| 33 | +- Test updates table with `test2.csv` (4 fields: a, b, c, d) |
| 34 | +- Added wait step between create and update (currently commented out for debugging) |
| 35 | +- Test expects 3 fields initially, then 4 fields after update |
| 36 | + |
| 37 | +### 4. Key Code Changes |
| 38 | + |
| 39 | +#### Pre-Update Wait Logic |
| 40 | +```go |
| 41 | +// If we're updating schema for a cloud file table, ensure the table is ready (not processing) |
| 42 | +// This prevents race conditions where we try to update while the initial sync is still running |
| 43 | +isUpdatingSchema := planState.Schema != nil |
| 44 | +if isUpdatingSchema && currentState.Source.ValueString() != "LOCAL_FILE" { |
| 45 | + // Check current status - if still processing, wait for it to complete |
| 46 | + currentResp, _, err := r.Api.GetTable(r.Auth, id) |
| 47 | + if err == nil && currentResp.Data != nil { |
| 48 | + attrs := currentResp.Data.GetAttributes() |
| 49 | + if status, ok := attrs.GetStatusOk(); ok && status != nil { |
| 50 | + statusStr := string(*status) |
| 51 | + if statusStr != "DONE" && statusStr != "ERROR" { |
| 52 | + // Wait for table to be ready before updating |
| 53 | + // ... retry logic ... |
| 54 | + } |
| 55 | + } |
| 56 | + } |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +#### Post-Update Retry Logic |
| 61 | +```go |
| 62 | +// Wait 3 seconds before first check (matches manual API test timing) |
| 63 | +if isUpdatingSchema && expectedFieldCount > 0 { |
| 64 | + time.Sleep(3 * time.Second) |
| 65 | +} |
| 66 | + |
| 67 | +// Retry until schema matches expected field count |
| 68 | +retryErr := utils.Retry(5*time.Second, 10, func() error { |
| 69 | + resp, httpResp, err = r.Api.GetTable(r.Auth, id) |
| 70 | + // Check schema field count matches expected |
| 71 | + // Check for file processing errors |
| 72 | + // Return retryable error if schema doesn't match yet |
| 73 | +}) |
| 74 | +``` |
| 75 | + |
| 76 | +## What Remains to Be Fixed |
| 77 | + |
| 78 | +### 1. Test Still Failing |
| 79 | +- The test currently fails, but we've been debugging API 500 errors (likely transient) |
| 80 | +- Need to verify the retry logic works correctly once API is stable |
| 81 | +- May need to adjust retry count/interval based on actual API timing |
| 82 | + |
| 83 | +### 2. Wait Step in Test |
| 84 | +- The wait step between create and update is currently commented out |
| 85 | +- Should be re-enabled once we confirm the pre-update wait logic works |
| 86 | +- Location: `datadog/tests/resource_datadog_reference_table_test.go` lines 88-95 |
| 87 | + |
| 88 | +### 3. Potential Issues to Investigate |
| 89 | +- **Race condition**: Even with pre-update wait, there might be a race between file sync completion and schema update |
| 90 | +- **Error handling**: Need to verify error messages are helpful when schema doesn't update |
| 91 | +- **Timing**: May need to increase retry count if schema updates take longer than 50 seconds |
| 92 | + |
| 93 | +## Exact Commands |
| 94 | + |
| 95 | +### Terraform Test Command |
| 96 | + |
| 97 | +```bash |
| 98 | +cd /Users/guillaume.brizolier/go/src/github.com/DataDog/terraform-provider-datadog |
| 99 | + |
| 100 | +dd-auth --domain dd.datad0g.com -- sh -c ' |
| 101 | +export DD_TEST_CLIENT_API_KEY=$DD_API_KEY |
| 102 | +export DD_TEST_CLIENT_APP_KEY=$DD_APP_KEY |
| 103 | +export DD_TEST_SITE_URL=https://dd.datad0g.com/ |
| 104 | +export DD_TEST_ORG=yB5yjZ |
| 105 | +export TF_ACC=true |
| 106 | +go test -v -run TestAccReferenceTable_SchemaEvolution ./datadog/tests/ -timeout 30m |
| 107 | +' |
| 108 | +``` |
| 109 | + |
| 110 | +### Manual API Test Commands |
| 111 | + |
| 112 | +#### 1. Create Table with test.csv (3 fields: a, b, c) |
| 113 | + |
| 114 | +```bash |
| 115 | +cd /Users/guillaume.brizolier/go/src/github.com/DataDog/terraform-provider-datadog |
| 116 | + |
| 117 | +dd-auth --domain dd.datad0g.com -- sh -c ' |
| 118 | +export DD_API_KEY |
| 119 | +export DD_APP_KEY |
| 120 | +TABLE_NAME="test_schema_evolution_$(date +%s)" |
| 121 | +
|
| 122 | +curl -X POST "https://dd.datad0g.com/api/v2/reference-tables/tables" \ |
| 123 | + -H "DD-API-KEY: $DD_API_KEY" \ |
| 124 | + -H "DD-APPLICATION-KEY: $DD_APP_KEY" \ |
| 125 | + -H "Content-Type: application/json" \ |
| 126 | + -d "{ |
| 127 | + \"data\": { |
| 128 | + \"type\": \"reference_table\", |
| 129 | + \"attributes\": { |
| 130 | + \"table_name\": \"$TABLE_NAME\", |
| 131 | + \"source\": \"S3\", |
| 132 | + \"file_metadata\": { |
| 133 | + \"sync_enabled\": true, |
| 134 | + \"access_details\": { |
| 135 | + \"aws_detail\": { |
| 136 | + \"aws_account_id\": \"924305315327\", |
| 137 | + \"aws_bucket_name\": \"dd-reference-tables-dev-staging\", |
| 138 | + \"file_path\": \"test.csv\" |
| 139 | + } |
| 140 | + } |
| 141 | + }, |
| 142 | + \"schema\": { |
| 143 | + \"primary_keys\": [\"a\"], |
| 144 | + \"fields\": [ |
| 145 | + {\"name\": \"a\", \"type\": \"STRING\"}, |
| 146 | + {\"name\": \"b\", \"type\": \"STRING\"}, |
| 147 | + {\"name\": \"c\", \"type\": \"STRING\"} |
| 148 | + ] |
| 149 | + } |
| 150 | + } |
| 151 | + } |
| 152 | + }" |
| 153 | +' |
| 154 | +``` |
| 155 | + |
| 156 | +**Save the table ID from the response** (e.g., `TABLE_ID="abc123..."`) |
| 157 | + |
| 158 | +#### 2. Wait for Initial Sync (3 seconds) |
| 159 | + |
| 160 | +```bash |
| 161 | +sleep 3 |
| 162 | +``` |
| 163 | + |
| 164 | +#### 3. Update Table with test2.csv (4 fields: a, b, c, d) |
| 165 | + |
| 166 | +```bash |
| 167 | +dd-auth --domain dd.datad0g.com -- sh -c ' |
| 168 | +export DD_API_KEY |
| 169 | +export DD_APP_KEY |
| 170 | +TABLE_ID="<TABLE_ID_FROM_STEP_1>" |
| 171 | +
|
| 172 | +curl -X PATCH "https://dd.datad0g.com/api/v2/reference-tables/tables/$TABLE_ID" \ |
| 173 | + -H "DD-API-KEY: $DD_API_KEY" \ |
| 174 | + -H "DD-APPLICATION-KEY: $DD_APP_KEY" \ |
| 175 | + -H "Content-Type: application/json" \ |
| 176 | + -d "{ |
| 177 | + \"data\": { |
| 178 | + \"type\": \"reference_table\", |
| 179 | + \"attributes\": { |
| 180 | + \"file_metadata\": { |
| 181 | + \"sync_enabled\": true, |
| 182 | + \"access_details\": { |
| 183 | + \"aws_detail\": { |
| 184 | + \"aws_account_id\": \"924305315327\", |
| 185 | + \"aws_bucket_name\": \"dd-reference-tables-dev-staging\", |
| 186 | + \"file_path\": \"test2.csv\" |
| 187 | + } |
| 188 | + } |
| 189 | + }, |
| 190 | + \"schema\": { |
| 191 | + \"primary_keys\": [\"a\"], |
| 192 | + \"fields\": [ |
| 193 | + {\"name\": \"a\", \"type\": \"STRING\"}, |
| 194 | + {\"name\": \"b\", \"type\": \"STRING\"}, |
| 195 | + {\"name\": \"c\", \"type\": \"STRING\"}, |
| 196 | + {\"name\": \"d\", \"type\": \"STRING\"} |
| 197 | + ] |
| 198 | + } |
| 199 | + } |
| 200 | + } |
| 201 | + }" |
| 202 | +' |
| 203 | +``` |
| 204 | + |
| 205 | +#### 4. Check Schema Immediately (should still show 3 fields) |
| 206 | + |
| 207 | +```bash |
| 208 | +dd-auth --domain dd.datad0g.com -- sh -c ' |
| 209 | +export DD_API_KEY |
| 210 | +export DD_APP_KEY |
| 211 | +TABLE_ID="<TABLE_ID_FROM_STEP_1>" |
| 212 | +
|
| 213 | +curl -X GET "https://dd.datad0g.com/api/v2/reference-tables/tables/$TABLE_ID" \ |
| 214 | + -H "DD-API-KEY: $DD_API_KEY" \ |
| 215 | + -H "DD-APPLICATION-KEY: $DD_APP_KEY" | jq ".data.attributes.schema.fields | length" |
| 216 | +' |
| 217 | +``` |
| 218 | + |
| 219 | +**Expected**: `3` (schema not updated yet) |
| 220 | + |
| 221 | +#### 5. Wait and Check Again (should show 4 fields) |
| 222 | + |
| 223 | +```bash |
| 224 | +sleep 5 |
| 225 | + |
| 226 | +dd-auth --domain dd.datad0g.com -- sh -c ' |
| 227 | +export DD_API_KEY |
| 228 | +export DD_APP_KEY |
| 229 | +TABLE_ID="<TABLE_ID_FROM_STEP_1>" |
| 230 | +
|
| 231 | +curl -X GET "https://dd.datad0g.com/api/v2/reference-tables/tables/$TABLE_ID" \ |
| 232 | + -H "DD-API-KEY: $DD_API_KEY" \ |
| 233 | + -H "DD-APPLICATION-KEY: $DD_APP_KEY" | jq ".data.attributes.schema.fields | length" |
| 234 | +' |
| 235 | +``` |
| 236 | + |
| 237 | +**Expected**: `4` (schema updated after file sync completes) |
| 238 | + |
| 239 | +## Test Files |
| 240 | + |
| 241 | +### Test Configuration |
| 242 | +- **File**: `datadog/tests/resource_datadog_reference_table_test.go` |
| 243 | +- **Test function**: `TestAccReferenceTable_SchemaEvolution` (line 61) |
| 244 | +- **Initial config**: `testAccCheckDatadogReferenceTableSchemaInitial` (uses `test.csv` with 3 fields) |
| 245 | +- **Update config**: `testAccCheckDatadogReferenceTableSchemaAddFields` (uses `test2.csv` with 4 fields) |
| 246 | + |
| 247 | +### Test Data Files (in S3 bucket `dd-reference-tables-dev-staging`) |
| 248 | +- **test.csv**: Contains columns `a, b, c` with sample data |
| 249 | +- **test2.csv**: Contains columns `a, b, c, d` with sample data |
| 250 | + |
| 251 | +## Environment Variables Required |
| 252 | + |
| 253 | +```bash |
| 254 | +export DD_TEST_CLIENT_API_KEY=<api_key> |
| 255 | +export DD_TEST_CLIENT_APP_KEY=<app_key> |
| 256 | +export DD_TEST_SITE_URL=https://dd.datad0g.com/ |
| 257 | +export DD_TEST_ORG=yB5yjZ # Public org ID for staging |
| 258 | +export TF_ACC=true # Required to run acceptance tests |
| 259 | +``` |
| 260 | + |
| 261 | +## Key Insights |
| 262 | + |
| 263 | +1. **Asynchronous Schema Updates**: Schema updates for cloud files are asynchronous and happen after file sync completes |
| 264 | +2. **Timing Matters**: Need to wait 3-5 seconds after update API call before schema reflects changes |
| 265 | +3. **Pre-Update Check**: Table must be in DONE/ERROR status before attempting schema update |
| 266 | +4. **Retry Logic**: Terraform provider needs retry logic to wait for async schema updates |
| 267 | +5. **Error Detection**: File processing errors (like "more columns than schema") indicate schema hasn't updated yet |
| 268 | + |
| 269 | +## Next Steps |
| 270 | + |
| 271 | +1. Re-run Terraform test once API is stable (currently seeing 500 errors) |
| 272 | +2. Verify pre-update wait logic prevents race conditions |
| 273 | +3. Re-enable wait step in test if needed |
| 274 | +4. Adjust retry timing if schema updates take longer than expected |
| 275 | +5. Add more detailed logging if issues persist |
| 276 | + |
| 277 | + |
0 commit comments