revert contributing.md back to current state on main

jefferyann-db · jefferyann-db · commit d0b1fd83229c · 2025-09-05T10:25:42.000-05:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -2,42 +2,34 @@
 
 By submitting a contribution to this repository, you certify that:
 
-1. **You have the right to submit the contribution.**  
+1. **You have the right to submit the contribution. 
    You created the code/content yourself, or you have the right to submit it under the project's license.
 
-2. **You grant us a license to use your contribution.**  
+2. **You grant us a license to use your contribution. 
    You agree that your contribution will be licensed under the same terms as the rest of this project, and you grant the project maintainers the right to use, modify, and distribute your contribution as part of the project.
 
-3. **You are not submitting confidential or proprietary information.**  
+3. **You are not submitting confidential or proprietary information. 
    Your contribution does not include anything you don’t have permission to share publicly.
 
 If you are contributing on behalf of an organization, you confirm that you have the authority to do so. You agree to confirm these terms in your pull request. Any request that does not explicitely accept the terms will be assumed to have accepted. 
 
-#### Contributor Requirements:
-- **Library Attribution**: Every contribution must include a direct link and license information for all Python libraries used in the implementation.
-
-- **Documentation**: Every data source connector must include comprehensive usage documentation in the README file, including:
-  - Configuration options and parameters
-  - Example code snippets
-  - Connection requirements
-  - Troubleshooting guidance
-
-- **Demo Notebook**: Every source connector must provide a Databricks notebook that demonstrates practical usage of the connector with real-world examples.
-
-- **Industry Tagging**: Contributors must identify and tag the relevant industries that could benefit from the connector (e.g., IoT, Manufacturing, Healthcare, Finance, etc.).
-
-- **Contribution Process**: All contributions must follow the established contribution process, including:
-  - Creating feature branches
-  - Writing comprehensive tests
-  - Submitting pull requests with detailed descriptions
-  - Responding to code review feedback
-  - Ensuring all CI/CD checks pass
-
-#### Code Review Process
-
-Code review takes place in GitHub pull requests. See [this article](https://help.github.com/articles/about-pull-requests/) if you're not familiar with GitHub Pull Requests.
-
-Once you open a pull request, project maintainers will review your code and respond to your pull request with any feedback they might have. The process at this point is as follows:
-
-1. An Approval is required from one of the project maintainers.
-2. Your change will be merged into the project's `main` branch
+### Best Practices
+1. Put data source into subfolder of root, e.g. `$/zipdcm`. Folder name is the shortname of your data source
+2. Every data source should live under the `dbx` package name. We may consider another sub-domain, e.g. `dbx.pds.<datasource short name>`
+3. Each connector lives some what independently, one connector doesn't break another.
+4. Each connector supports `Python 3.12`
+5. Each connector should have inline python docs to help out IDEs, AIs in connector usage.
+6. Error & Exception handling is critical. Start with input arguments. For exceptions, please include context except for where context could be sensitive (e.g. secrets)
+7. Meets style guideline is required (`black`, `isort`, ...)
+8. Add a Makefile with standard actions, e.g. dev, style, check, test, ...
+9. Every data source should have unit tests.
+10. Every data source should have an integration test, include open data set file examples, a downloader, or a setup script
+11. Every data source should have a `<data source name>-demo.ipynb` demo notebook
+12. Every data source has a README.md
+13. Every data source has a LICENSE.md file. Please ensure legal signs off on this. sub-components should be open source best case. Worst case, provide a downloader for a proprietary component. Do not package propietary components into this repo. Ask if in doubt.
+14. Every data source provides BYOL, bring your own lineage, this will distinguish these data sources from data sources for any other platform
+15. The main readme should summarize the connector's capabilities, perhaps with a check mark system for capabilities (e.g. :check:Read :check:Write :check:Readstream :check:Writestream)
+16. Support `pip install databricks-python-data-source[<shortname>]` where user selects individual connectors to install, avoid pulling in a mass of dependencies their use case doesn't need.
+17. Support installing from github.
+18. Support running after creating a github folder in Databricks.
+19. In connector README.md, and demo notebook, please document compute requirements and other environmental requirements.