v0.27.0
- Added
mlflowto known packages (#1895). Themlflowpackage has been incorporated into the project and is now recognized as a known package. This integration includes modifications to the use ofmlflowin the context of UC Shared Clusters, providing recommendations to modify or rewrite certain functionalities related tosparkContext,_conf, andRDDAPIs. Additionally, the artifact storage system ofmlflowin Databricks and DBFS has undergone changes. Theknown.jsonfile has also been updated with several new packages, such asalembic,aniso8601,cloudpickle,docker,entrypoints,flask,graphene,graphql-core,graphql-relay,gunicorn,html5lib,isort,jinja2,markdown,markupsafe,mccabe,opentelemetry-api,opentelemetry-sdk,opentelemetry-semantic-conventions,packaging,pyarrow,pyasn1,pygments,pyrsistent,python-dateutil,pytz,pyyaml,regex,requests, and more. These packages are now acknowledged and incorporated into the project's functionality. - Added
tensorflowto known packages (#1897). In this release, we are excited to announce the addition of thetensorflowpackage to our known packages list. Tensorflow is a popular open-source library for machine learning and artificial intelligence applications. This package includes several components such astensorflow,tensorboard,tensorboard-data-server, andtensorflow-io-gcs-filesystem, which enable training, evaluation, and deployment of machine learning models, visualization of machine learning model metrics and logs, and access to Google Cloud Storage filesystems. Additionally, we have included other packages such asgast,grpcio,h5py,keras,libclang,mdurl,namex,opt-einsum,optree,pygments,rich,rsa,termcolor,pyasn1_modules,sympy, andthreadpoolctl. These packages provide various functionalities required for different use cases, such as parsing Abstract Syntax Trees, efficient serial communication, handling HDF5 files, and managing threads. This release aims to enhance the functionality and capabilities of our platform by incorporating these powerful libraries and tools. - Added
torchto known packages (#1896). In this release, the "known.json" file has been updated to include several new packages and their respective modules for a specific project or environment. These packages include "torch", "functorch", "mpmath", "networkx", "sympy", "isympy". The addition of these packages and modules ensures that they are recognized and available for use, preventing issues with missing dependencies or version conflicts. Furthermore, the_analyze_dist_infomethod in theknown.pyfile has been improved to handle recursion errors during package analysis. A try-except block has been added to the loop that analyzes the distribution info folder, which logs the error and moves on to the next file if aRecursionErroroccurs. This enhancement increases the robustness of the package analysis process. - Added more known libraries (#1894). In this release, the
knownlibrary has been enhanced with the addition of several new packages, bringing improved functionality and versatility to the software. Key additions include contourpy for drawing contours on 2D grids, cycler for creating cyclic iterators, docker-pycreds for managing Docker credentials, filelock for platform-independent file locking, fonttools for manipulating fonts, and frozendict for providing immutable dictionaries. Additional libraries like fsspec for accessing various file systems, gitdb and gitpython for working with git repositories, google-auth for Google authentication, html5lib for parsing and rendering HTML documents, and huggingface-hub for working with the Hugging Face model hub have been incorporated. Furthermore, the release includes idna, kiwisolver, lxml, matplotlib, mypy, peewee, protobuf, psutil, pyparsing, regex, requests, safetensors, sniffio, smmap, tokenizers, tomli, tqdm, transformers, types-pyyaml, types-requests, typing_extensions, tzdata, umap, unicorn, unidecode, urllib3, wandb, waterbear, wordcloud, xgboost, and yfinance for expanded capabilities. The zipp and zingg libraries have also been included for module name transformations and data mastering, respectively. Overall, these additions are expected to significantly enhance the software's functionality. - Added more value inference for
dbutils.notebook.run(...)(#1860). In this release, thedbutils.notebook.run(...)functionality ingraph.pyhas been significantly updated to enhance value inference. The change includes the introduction of new methods for handlingNotebookRunCallandSysPathChangeobjects, as well as the refactoring of theget_notebook_pathmethod intoget_notebook_paths. This new method now returns a tuple of a boolean and a list of strings, indicating whether any nodes could not be resolved and providing a list of inferred paths. A new private method,_get_notebook_paths, has also been added to retrieve notebook paths from a list of nodes. Furthermore, theload_dependencymethod inloaders.pyhas been updated to detect the language of a notebook based on the file path, in addition to its content. TheNotebookclass now includes a new parameter,SUPPORTED_EXTENSION_LANGUAGES, which maps file extensions to their corresponding languages. In thedatabricks.labs.ucxproject, more value inference has been added to the linter, including new methods and enhanced functionality fordbutils.notebook.run(...). Several tests have been added or updated to demonstrate various scenarios and ensure the linter handles dynamic values appropriately. A new test file for theNotebookLoaderclass in thedatabricks.labs.ucx.source_code.notebooks.loadersmodule has been added, with a new class,NotebookLoaderForTesting, that overrides thedetect_languagemethod to make it a class method. This allows for more robust testing of theNotebookLoaderclass. Overall, these changes improve the accuracy and reliability of value inference fordbutils.notebook.run(...)and enhance the testing and usability of the related classes and methods. - Added nightly workflow to use industry solution accelerators for parser validation (#1883). A nightly workflow has been added to validate the parser using industry solution accelerators, which can be triggered locally with the
make solacccommand. This workflow involves a new Makefile target, 'solacc', which runs a Python script located at 'tests/integration/source_code/solacc.py'. The workflow is designed to run on the latest Ubuntu, installing Python 3.10 and hatch 1.9.4 using pip, and checking out the code with a fetch depth of 0. It runs on a daily basis at 7am using a cron schedule, and can also be triggered locally. The purpose of this workflow is to ensure parser compatibility with various industry solutions, improving overall software quality and robustness. - Complete support for pip install command (#1853). In this release, we've made significant enhancements to support the
pip installcommand in our open-source library. Theregister_librarymethod in theDependencyResolver,NotebookResolver, andLocalFileResolverclasses has been modified to accept variable numbers of libraries instead of just one, allowing for more efficient dependency management. Additionally, theresolve_importmethod has been introduced in theNotebookResolverandLocalFileResolverclasses for improved import resolution. Moreover, the_splitstatic method has been implemented for better handling of pip command code and egg packages. The library now also supports the resolution of imports in notebooks and local files. These changes provide a solid foundation for fullpip installcommand support, improving overall robustness and functionality. Furthermore, extensive updates to tests, including workflow linter and job dlt task linter modifications, ensure the reliability of the library when working with Jupyter notebooks and pip-installable libraries. - Infer simple f-string values when computing values during linting (#1876). This commit enhances the open-source library by adding support for inferring simple f-string values during linting, addressing issue #1871 and progressing #1205. The new functionality works for simple f-strings but currently does not support nested f-strings. It introduces the InferredValue class and updates the visit_call, visit_const, and _check_str_constant methods for better linter feedback. Additionally, it includes modifications to a unit test file and adjustments to error location in code. The commit also presents an example of simple f-string handling, emphasizing the limitations yet providing a solid foundation for future development. Co-authored by Eric Vergnaud.
- Propagate widget parameters and data security mode to
CurrentSessionState(#1872). In this release, thespark_version_compatibilityfunction incrawlers.pyhas been refactored toruntime_version_tuple, returning a tuple of integers instead of a string. The function now handles custom runtimes and DLT, and raises a ValueError if the version components cannot be converted to integers. Additionally, theCurrentSessionStateclass has been updated to propagate named parameters from jobs and check for DBFS paths as both named and positional parameters. New attributes, includingspark_conf,named_parameters, anddata_security_mode, have been added to the class, all with default values ofNone. TheWorkflowTaskContainerclass has also been modified to include an additionaljobparameter in its constructor and new attributes fornamed_parameters,spark_conf,runtime_version, anddata_security_mode. The_register_cluster_infomethod and_lint_taskmethod inWorkflowLinterhave also been updated to use the newCurrentSessionStateattributes when linting a task. A new methodJob()has been added to theWorkflowTaskContainerclass, used in multiple unit tests to create aJobobject and pass it as an argument to theWorkflowTaskContainerconstructor. The tests cover various scenarios for library types, such as jar files, PyPI libraries, Python wheels, and requirements files, and ensure that theWorkflowTaskContainerobject can extract the relevant information from aJobobject and store it for later use. - Support inferred values when linting DBFS mounts (#1868). This commit adds value inference and enhances the consistency of advice messages in the context of linting Databricks File System (DBFS) mounts, addressing issue #1205. It improves the precision of deprecated file system path calls and updates the handling of default DBFS references, making the code more robust and future-proof. The linter's behavior has been enhanced to detect DBFS paths in various formats, including string constants and variables. The test suite has been updated to include new cases and provide clearer deprecation warnings. This commit also refines the way advice is generated for deprecated file system path calls and renames
AdvisorytoDeprecationin some places, providing more accurate and helpful feedback to developers. - Support inferred values when linting spark.sql (#1870). In this release, we have added support for inferring the values of table names when linting PySpark code, improving the accuracy and usefulness of the PySpark linter. This feature includes the ability to handle inferred values in Spark SQL code and updates to the test suite to reflect the updated linting behavior. The
QueryMatcherclass inpyspark.pyhas been updated to infer the value of the table name argument in aCallnode, and an advisory message is generated if the value cannot be inferred. Additionally, the use of direct filesystem references, such as "s3://bucket/path", will be deprecated in favor of more dynamic and flexible querying. For example, the table "old.things" has been migrated to "brand.new.stuff" in the Unity Catalog. Furthermore, a loop has been introduced to demonstrate the ability to compute table names programmatically within SQL queries, enhancing the system's flexibility and adaptability. - Support inferred values when linting sys path (#1866). In this release, the library's linting system has been enhanced with added support for inferring values in the system path. The
DependencyGraphclass ingraph.pyhas been updated to handle new node types, includingSysPathChange,NotebookRunCall,ImportSource, andUnresolvedPath. TheUnresolvedPathnode is added for unresolved paths during linting, and new methods have been introduced inconftest.pyfor testing, such asDependencyResolver,Whitelist,PythonLibraryResolver,NotebookResolver, andImportFileResolver. Additionally, the library now recognizes inferred values, including absolute paths added to the system path viasys.path.append. New tests have been added to ensure the correct behavior of theDependencyResolverclass. This release also introduces a new file,sys-path-with-fstring.py, which demonstrates the use of Python's f-string syntax to append values to the system path, and a new method,BaseImportResolver, has been added to theDependencyResolverclass to resolve imports more flexibly and robustly.
Contributors: @nfx, @ericvergnaud, @JCZuurmond, @asnare