Releases: data-dot-all/dataall
v2.2.0
What's Changed
This time there are no warnings.
New features 🆕
- Enabling S3 bucket share by @anushka-singh in #848
- Support For External IDP and External User Pool Provider by @TejasRGitHub in #897
- Added support for GitHub using AWS CodeStarSourceConnection by @asifma in #834
- BYO VPC in MLStudio by @noah-paige in #894
- New share views by @dlpzx in #885
Enhancements 🥇
- Design better module dependency handling by @maryamkhidir and @dlpzx in #852
- Move feature toggle checker to base by @dbalintx in #833
- Add callback and dependency matrix useclient by @noah-paige in #855
- Update EnvironmentCreateForm.js to combine commands for policy creation and bootstrapping by @mourya-33 in #868
- Add Quicksight Validation on Submit Share by @noah-paige in #873
- Add additional checks for dataset importing by @nikpodsh and @dlpzx in #883
- Add SCP error handling in Quicksight identity region checks by @dlpzx in #896
- Update CodeBuild images to Linux2 standard5.0 (node16 to node18) + Update Docker images to use AmazonLinux:2023 (node18 and Python3.9) by @dlpzx in #889 by @noah-paige in #907
- Changed name button + title to be consistent in UI by @grashopper42 in #888
- Upgrade DDK and Resolve Data.all Pipelines by @noah-paige in #866
- KMS explosion fix (policy optimization) by @anushka-singh in #882
Fixes 🪲
- Add the cloudformation:ContinueUpdateRollback permission to the pivotRole, for administration of linked environment accounts. by @rbernotas in #850
- Fix Module Enabled Pipelines by @noah-paige in #874
- Add Athena:UpdateWorkGroup permissions to CDK Exec Policy by @noah-paige in #892
- Add Pagination to Return Full List Cognito Groups by @noah-paige in #891
- Remove unnecessary MANAGE_ORGANIZATIONS check by @dlpzx in #887
- Fix S3DatasetClient upload data by @noah-paige in #909
- Fix Migration Script for New Deployment by @noah-paige in #908
- Create frontend config role regardless of custom auth or not in backend by @noah-paige in #913
- Fix permissions on share workflows by @dlpzx in #914
Documentation 📚
- Documentation for Setting up External Idp by @TejasRGitHub in #903
Dependencies
- Upgrade Athena engine version to v3 by @dlpzx in #886
- Bump axios from 0.26.1 to 1.6.0 in /frontend by @dependabot in #867
- Bump certifi from 2022.12.7 to 2023.7.22 in /deploy/custom_resources/custom_authorizer by @dependabot in #910
- Bump urllib3 from 1.26.15 to 1.26.18 in /deploy/custom_resources/custom_authorizer by @dependabot in #911
- Bump requests from 2.29.0 to 2.31.0 in /deploy/custom_resources/custom_authorizer by @dependabot in #912
New Contributors 👨💻 👩💻
- @grashopper42 made their first contribution in #888
Full Changelog: v2.1.0...v2.2.0
v2.1.0
What's Changed
- Using cdk.json parameter
enable_update_dataall_stacks_in_cicd_pipeline--> automatically updates the environments and dataset stacks in the CICD pipeline - Waiting for overnight update stack task --> same as the above, but it runs at a daily schedule.
- Updating environments in Environment > Stack tab > click on
Updatebutton --> manual update
Governance 🏛️
- Update to Governance Model by @NickCorbett in #736
- Update CONTRIBUTING.md by @NickCorbett in #838
- Create .gitvote.yml by @NickCorbett in #836
New features 🆕
- Re-authorization workflows by @noah-paige in #787
- Email Notification on Share Requests by @TejasRGitHub in #818
- Handle pre-filtering of tables for multiple buckets databases by @anushka-singh in #811
- Limit pivot role S3 permissions by @dlpzx in #780
- Limit pivot role KMS permissions by @dlpzx in #830
Enhancements 🥇
- Fix
shell=truesemgrep issues by @dlpzx in #760 - Add global flag to replace and avoid scanning issues on
incomplete-sanitizationby @dlpzx in #762 - Allow to submit a share when you are both an approver and a requester by @zsaltys in #793
- Redirect upon creating a share request by @zsaltys in #799
- Add frontend and backend feature flags by @zsaltys in #817
- Make hosted_zone_id optional by @lorchda in #812
- Add configurable session timeout to Cognito by @manjulaK in #786
- Modularization of notifications, refactor from core to modules by @dlpzx in #822
- Add Additional Error Messages for KMS Key lookup on imported dataset by @noah-paige in #748
- Handle Environment Import of IAM service roles by @noah-paige in #749
- Add condition when there are no public subnets by @lorchda in #794
- Check other share exists before clean up by @noah-paige in #769
- Configure Pytests on Feature Flags by @noah-paige in #764
Fixes 🪲
- Update Lambda runtime from node14 to node16 or node18 and from python3.7 to python3.8 by @nikpodsh in #782
- Build Compliant Names for Opensearch Resources by @noah-paige and made it generic by @dlpzx in #750
- Fix Git branch name length, truncate to 100 chars by @dlpzx in #775
- Fix CodeBuild policy length by @noah-paige in #774
- Fix naming of MLSTUDIO module by @noah-paige in #756
- Fix cdk exec policy for bootstraping linked accounts (#763) by @noah-paige in #768
- Fix external forks for CDK nag by @dlpzx in #767 and in #758
- Fix path of patch_ssm() for pytest fixutre by @noah-paige in #772
- Add Update Permissions to Lambda by @noah-paige in #835
Dependencies
- Add resolutions for yarn.lock pinned packages by @dlpzx in #757
- Upgrade babel to non-vulnerable version 7.23.2 by @dlpzx in #816
- Bump werkzeug from 2.2.3 to 3.0.1 in /tests by @dependabot in #831
- Bump werkzeug from 2.3.3 to 3.0.1 in /backend/dataall/base/cdkproxy by @dependabot in #832
- Bump react-devtools-core from 4.28.0 to 4.28.4 in /frontend by @dependabot in #824
Documentation 📚
- Update architecture diagrams, add region info in deployment pre-requisites and new features of 2.1.0 by @dlpzx, @TejasRGitHub, @lorchda and @noah-paige in #821
New Contributors 👨💻 👩💻
Special thanks to the new contributors!
- @manjulaK made their first contribution in #786
- @zsaltys made their first contribution in #793
- @anushka-singh made their first contribution in #811
- @TejasRGitHub made their first contribution in #818
- @lorchda made their first contribution in #794
Full Changelog: v2.0.0...v2.1.0
v2.0.0
What's Changed
Major version upgrade ☀️
Data.all v2 is a modular version of data.all that allows customers to easily configure and customize data.all to their needs. In a single config file, the different modules can be configured, enabled or disabled. New features and customizations to the modules can now be added to the source code, as well as complete new modules.
In this release we have carried out a deep refactoring of the backend and frontend packages and the resulting code shows significant differences with the v1.6.2 structure. Refer to the following PRs and issues for more details on the design changes.
- Generic description and motivation
- Backend layout and plug-in architecture
- Frontend layout
- Frontend plug-in architecture
Upgrading from v1.6.2 to v2 does NOT include any breaking changes. Despite the magnitude of the code changes, there are no changes to the architecture diagram or to existing resources. Pre-existing datasets, environments, shares or any other resources are not affected by the upgrade.
Enhancements and fixes 🪲
- Update auth-at-edge semantic version to latest 2.1.7 by @wolfit in #710
- Update PR template to add security questions by @jorgeig-space in #673
- Add and refine Explicit CDK Execution Policy - Linking Envs by @noah-paige in #667 and in #648
- Fix Dataset Profiling Glue Job by @noah-paige in #649 and in #701
- Fix migration script for v1.2 upgrade by @dlpzx in #651
- Fix delete environment validation on Consumption roles by @noah-paige in #693
- Fix dataset pagination by @noah-paige in #700
- Fix canary user password creation by @dbalintx in #718
- Fix npm version in VPC facing architecture by @dlpzx in #724
Documentation 📚
- Updated GitHub pages in #654 by @dlpzx and @maryamkhidir
Contributors
- data.all V2 contributors: @nikpodsh, @dbalintx, @dlpzx, @itsmo-amzn , @maryamkhidir, @noah-paige and @AmrSaber
- @jorgeig-space made their first contribution in #673
- @wolfit made their first contribution in #710
Full Changelog: v1.6.2...v2.0.0
v1.6.2
What's Changed
- Add missing KMS keys for canaries by @dlpzx in #619
- Allow restricted nacls backend VPC by @noah-paige in #626
- Fix cloudfront stack in case custom domain is given by @dbalintx in #607
- resolve unnecessary dependency in git_release role by @dlpzx in #623
- get prefix list ids for dbmigration for infra region by @dlpzx in #624
- Handle External ID SSM v1.6.1> by @noah-paige in #630
Upgrading from <v1.6.0 to v1.6.2
The externalID used to secure the pivotRole(s) in linked environments will be moved from AWS Secrets Manager to AWS Systems Manger Parameter Store as part of this upgrade.
enable_pivot_role_auto_create set to true in your cdk.json then you will not have to perform the manual steps listed below and can simply upgrade to v1.6.2. If not please continue with the manual steps below:
In order to retain the same externalID and not have to update the pivotRole(s) of each linked environment, follow the below steps:
-
In your data.all deployment account, Navigate to AWS Secrets Manager and retrieve the secret value of the external ID (named
dataall-externalId-{envname}) --> keep this value somewhere for later reference

-
Upgrade code from existing version to v1.6.2 and commit latest code changes to deploy via CodePipeline
-
Once the CodePipeline execution is complete, Navigate to SSM Parameter Store in Deployment Account and find externalID Parameter (named
/dataall/{envname}/pivotRole/externalId) --> edit the existing value with the one retained from Step 1

Full Changelog: v1.6.1...v1.6.2
v2.0.0-beta1
Beta pre-release of version 2.0.0, focused on the refactor to modularize data.all. This version includes a modularized backend but not yet a modularized front-end, which will be published with the final release.
Known issues affecting deployment
In the deployment guide, run step 8 before step 5, then continue from step 5. This is needed because data.all uses the cdk look up roles in CDK synth, which requires bootstrapping the accounts before running cdk synth locally. Documentation will be updated for the final release.
Known issues
- #556 Request for share is being sent for invalid environment (CREATE_FAILED)
- #540 OpenSearch stack failed during backend deploy due to length of policy name
- #534 Catalog Search along with filters
- #533 Profille Job run fails
- #428 Prefix crawling is crawling complete bucket instead of specific folder
- #374 Error in Monitoring tab in Admin Settings
- #338 Import of Dashboard / Dataset - Environment selection drop-down list is limited to 5 environments
- #288 Can't Paginate to view all Folders
- #625 CDK execution role (custom template) throws S3 access denied error for pivotRole auto-created nested stack
- Denied share requests show the wrong message to the asking user: approved instead of denied (no effect on actual sharing)
- Logging of approvals for sharing shows
AWSResourceNotFoundfor some approvals - There is an issue when user creates a dataset he/she can’t upload the data using UPLOAD button. We are facing CORS error which disappears after some time
- After creating a dataset, a user may temporarily be unable to upload data using the UPLOAD button
What's Changed
- sync modularization main (frontend) with main by @AmrSaber in #395
- modularization: backend Pluginarchitecture by @nikpodsh in #359
- frontend: simplify dev dockerfile by @AmrSaber in #396
- frontend: make styling consistent and remove dead code by @AmrSaber in #394
- ignore styling commits in git blame by @AmrSaber in #405
- modularization: Dataset Modularization pt.1 by @nikpodsh in #413
- modularization: Datasets modularization pt.2 by @nikpodsh in #432
- modularization: Datasets modularization pt.3 by @nikpodsh in #440
- modularization: Datasets modularization pt.4 by @nikpodsh in #441
- modularization: Datasets modularization pt.5 by @nikpodsh in #442
- modularization: Module dependencies by @nikpodsh in #447
- modularization: Worksheets modularization by @dbalintx in #449
- modularization: Merge from main V1.5.2 to the modularization branch by @nikpodsh in #463
- modularization: MLstudio modularization by @dlpzx in #486
- modularization: testing extensions by @dlpzx in #518
- modularization: Disable and skip module test directories for modules that are inactive by @nikpodsh in #522
- modularization: Dataset Sharing modularization by @nikpodsh in #488
- modularization: Datapipelines modularization by @dbalintx in #457
- Dashboard modularization by @nikpodsh in #537
- feat: Redshift removal by @dbalintx in #551
- Fix methods without permissions by @nikpodsh in #547
- feat:increase memory limit for local frontend container by @dbalintx in #568
- Change the pipeline for the modularization by @nikpodsh in #545
- Add generic way to toggle data.all features by @blitzmohit in #538
- Refactoring of aws calls by @nikpodsh in #550
- Migrate to a new permission checker by @nikpodsh in #569
- Core modularization by @nikpodsh in #592
- Merge main into modularization-main branch by @nikpodsh in #595
- Resolve modularization inconsistencies by @dbalintx in #605
- Modularization-main bugfixes by @dbalintx in #604
- Frontend - Module Enablement by @itsmo-amzn in #602
- Fixing of linting error by @nikpodsh in #627
- Fix assume role for the fresh account by @nikpodsh in #628
New Contributors
- @blitzmohit made their first contribution in #538
Full Changelog: v1.6.1...v2.0.0-beta1
v1.6.1
What's Changed
Manual actions required
ONLY if you are upgrading!
In the first run the CodePipeline will fail in the CDK Synth stage if no additional changes are done:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::111111111111:assumed-role/SOME ROLE/... is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::222222222222:role/cdk-hnb659fds-lookup-role-22222222222-eu-west-1
CodeBuild needs additional permissions to assume the IAM role in the CDK Synth stage. Since we cannot update this CodeBuild stage without running it, the permissions need to be added manually.
Upgrading from V1.6.0 to v1.6.1
The role that we need to update is a role named <PREFIX>-<GITBRANCH>-codebuild-baseline-role. It will say it in the error message in the CodeBuild logs
- Go to the IAM role (
<PREFIX>-<GITBRANCH>-codebuild-baseline-role) and click onAdd permissions>Create inline policy
2. Update the policy, use the JSON and copy the policy below:
The policy of the Codebuild execution role need to include the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::*:role/cdk-hnb659fds-lookup-role*"
}
]
}
- After the pipeline has successfully run, go back to the IAM role and remove the manually added policy. The policy is now added as part of infrastructure as code.
Upgrading from <V1.6.0 to v1.6.1
The error points at a different role some. A role created by CDK that looks like the following in the CodeBuild logs:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts:::111111111111:assumed-role/dataall-sbx8-cicd-stack-dataallsbx8cdkpipelinePipe-HMXY7D9OX4FM/AWSCodeBuild-30c50765-4529-4d20-99ce-88f82139a82c is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::22222222222:role/cdk-hnb659fds-lookup-role-22222222222-eu-west-1
We find the role and update it as we explained in the "Upgrading from V1.6.0 to v1.6.1" section.

Once that is done, retry the CodeBuild Synth stage. In this case you do NOT need to cleanup the manually added policies as this role will be deleted.
Full Changelog: v1.6.0...v1.6.1
v1.6.0
What's Changed
New features
- Add share reason in share requests by @noah-paige in #498
- Import KMS key in imported datasets by @dlpzx in #515 and #572. Support for pre-existing imported datasets in #578
Security
- Fine-grained NACLs for backend VPC creation by @noah-paige in #543 and in #573
- Implement security response headers in Cloudfront distributions by @nikpodsh in #529
- Sanitize the string to avoid a connection string injection by @nikpodsh in #532
- Restrict KMS keys' policies by @noah-paige in #524
- Limit dataset IAM role permissions by @dlpzx in #497
- Limit environment IAM roles permissions by @dlpzx in #515
- Limit pivot role (IAM role) permissions by @dlpzx in #535 --> it will only be automatically applied to
dataallPivotRole-cdk. Migrate to auto-createddataallPivotRole-cdkreleased in V1.4.0 or manually update thedataallPivotRoleroles in your environments. - Move parameters from Secrets Manager to SSM by @dlpzx in #455
- Disable profiling results from "secret" and "official" datasets by @dlpzx in #482
- CDK execution role policy template by @mourya-33 in #562
Bug-fixes
- Fix deletion of imported Glue database by @dlpzx in #512
- Removed unused resources and consolidate KMS keys in environment stack by @noah-paige in #524
- Fix urllib3 dependencies for glue profiling job by @noah-paige in #513
- Add cookiecutter config and environment variable for datapipelines stacks by @dbalintx in #582
- v1.6.0 backwards compatibility changes by @dlpzx in #567
- Add Glue Resource Policy Permissions for cross account share requests by @noah-paige in #579
⚠️ ⚠️ ⚠️ Important ⚠️ ⚠️ ⚠️
Breaking changes
⚠️ IMPORTANT: It is necessary to upgrade to version >V1.5.0 before upgrading to V1.6 to avoid deletion of resources due to the removal of custom resources.⚠️ IMPORTANT: requires an update of environments and then datasets after upgrading. Either using cdk.json parameterenable_update_dataall_stacks_in_cicd_pipeline, waiting for overnight update stack task, or manually updating first environments and then datasets. If the environment stack is not updated Dataset creation and other functionalities will fail.⚠️ IMPORTANT: Because of the implementation of #529 the CloudFront distribution will be recreated. This means that the url used in the CloudFront distribution will be new. You can directly use the new url. In case you are using a custom domain with an SSL certificate, before upgrading to v1.6, you should remove the CNAME's (for both frontend and userguide ) from the old distributions as mentioned in #603⚠️ IMPORTANT: Additional EC2 permissions are needed in the CDK Synth CodeBuild because of the implementation of #543 --> this can be avoided by upgrading to v1.5.6 before upgrading to v1.6.0 or manually adding the necessary permissions and retrying the pipeline run. Check the PR for more details.- Developing locally requires using a role ending in
-graphql-role,-awsworker-roleorecs-tasks-roleto work with the more restrictive pivotRole trust policy implemented in #535.
New Contributors 🚀
- @mourya-33 made their first contribution in #562
Full Changelog: v1.5.6...v1.6.0
v1.5.6
What's Changed
Bug Fixes
- Resolve dataset share checks when deleting dataset by @noah-paige in #554
Enhancements
- Limiting read-only access to root file systems in ECS by @dbalintx in #523
- Optimized docker image size by @srinivasreddych in #549
- Update import dataset documentation by @marjet26 in #546
- Added ec2:DescribePrefix permissions to CDKSynth by @dlpzx in #566
Package updates
- Bump tough-cookie from 4.1.2 to 4.1.3 in /frontend by @dependabot in #558
- Bump semver from 5.7.1 to 5.7.2 in /frontend by @dependabot in #564
New Contributors
Welcome to the project 🎉
- @marjet26 made their first contribution in #546
- @srinivasreddych made their first contribution in #549
Full Changelog: v1.5.5...v1.5.6
v1.5.5
What's Changed
- hotfix: dynamic SQL generation by @chamcca in #514
- dependabot: upgrade
fast-xml-parser,aws-amplify,react-scripts, overridereact-reduxto non-vulnerable version by @dlpzx in #521 - dependabot: resolve
nth-checkin sub-dependencies by @dlpzx in #525
New Contributors
Full Changelog: v1.5.4...v1.5.5
v1.5.4
What's Changed
- Update CDK Version to v2.77.0 to fix vulnerability with CDK Pipeline role in CDK Pipelines construct by @gmuslia in #484
- Safe removal of consumption roles and teams with open share requests by @dlpzx in #485
- Fix typo that destroys storage locations by @dlpzx in #481
Full Changelog: v1.5.3...v1.5.4