I got tired of manually creating training datasets, so I built this tool. Transform your PDFs and documents into fine-tuning data automatically. This application simplifies the process of dataset generation, allowing you to focus on training your machine learning models instead of preparing data.
To get started, you will need to download the application. Follow the steps below to download and install the ai-dataset-generator.
-
Visit the Releases Page
Go to the Releases page to find the latest version of the application. -
Download the Application
Look for the latest version of the ai-dataset-generator. Click on the appropriate download link for your operating system (Windows, macOS, or Linux). -
Run the Application
Locate the downloaded file on your computer. Double-click to run it. Follow any prompts that appear to complete the installation.
Before downloading, ensure your system meets these basic requirements:
- Operating System: Windows 10 or later, macOS 10.14 or later, Linux (Ubuntu 20.04 or later)
- Memory: At least 4 GB of RAM
- Disk Space: Minimum of 100 MB available
- Dependencies:
- Python 3.7+
- Required Libraries:
pandasPyPDF2nltk
Make sure your operating system and dependencies are up to date for the best experience.
The ai-dataset-generator offers the following features:
-
PDF to Dataset Conversion: Easily convert PDF files into structured datasets suitable for model training.
-
Customizable Output: Choose the format you need for your dataset. The tool supports CSV and JSON formats.
-
Batch Processing: Generate multiple datasets at once by uploading several files.
-
User-Friendly Interface: Navigate a simple, clear interface designed for all users, regardless of technical background.
If you encounter any issues while using the application, consider the following solutions:
-
Installation Problems: Ensure that you have sufficient disk space and the correct version of your operating system. Run the installation as an administrator if necessary.
-
File Not Found: If the application cannot find your PDF files, double-check the file paths. Ensure that your files are accessible and not corrupted.
-
Performance Issues: If the application runs slowly, ensure no other heavy programs are running on your computer. Closing unnecessary applications may help.
This application is licensed under the MIT License. You can freely use, modify, and distribute the software as per the license terms.
We welcome contributions from the community! If you'd like to contribute to the ai-dataset-generator, please follow these steps:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/NewFeature). - Commit your changes (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/NewFeature). - Open a pull request.
Feel free to reach out via issues if you have questions or suggestions for improvements.
If you need further assistance, check the issues page on GitHub or contact us through the repository. We aim to respond promptly and help resolve any challenges you may face.
We hope the ai-dataset-generator makes your data preparation easier and faster. Don't forget to visit the Releases page to stay updated on new versions and features. Happy dataset generating!