 
CAMeL Tools is suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Please use GitHub Issues to report a bug or if you need help using CAMeL Tools.
You will need Python 3.8 - 3.12 (64-bit) as well as the Rust compiler installed.
You will need to install some additional dependencies on Linux and macOS. Primarily CMake, and Boost.
On Ubuntu/Debian you can install these dependencies by running:
sudo apt-get install cmake libboost-all-devOn macOS you can install them using Homewbrew by running:
brew install cmake boostpip install camel-tools
# or run the following if you already have camel_tools installed
pip install camel-tools --upgradeOn Apple silicon Macs you may have to run the following instead:
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools
# or run the following if you already have camel_tools installed
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools --upgrade# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools
# Install from source
pip install .
# or run the following if you already have camel_tools installed
pip install --upgrade .To install the datasets required by CAMeL Tools components run one of the following:
# To install all datasets
camel_data -i all
# or just the datasets for morphology and MLE disambiguation only
camel_data -i light
# or just the default datasets for each component
camel_data -i defaultsSee Available Packages for a list of all available datasets.
By default, data is stored in ~/.camel_tools.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA environment variable to the desired
path.
Add the following to your .bashrc, .zshrc, .profile,
etc:
export CAMELTOOLS_DATA=/path/to/camel_tools_dataNote: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.
pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html
# or run the following if you already have camel_tools installed
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html camel-tools# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools
# Install from source
pip install -f https://download.pytorch.org/whl/torch_stable.html .
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html .To install the data packages required by CAMeL Tools components, run one of the following commands:
# To install all datasets
camel_data -i all
# or just the datasets for morphology and MLE disambiguation only
camel_data -i light
# or just the default datasets for each component
camel_data -i defaultsSee Available Packages for a list of all available datasets.
By default, data is stored in
C:\Users\your_user_name\AppData\Roaming\camel_tools.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA environment variable to the desired
path. Below are the instructions to do so (on Windows 10):
- Press the Windows button and type env.
- Click on Edit the system environment variables (Control panel).
- Click on the Environment Variables... button.
- Click on the New... button under the User variables panel.
- Type CAMELTOOLS_DATAin the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory... button.
- Click OK on all the opened windows.
To get started, you can follow along the Guided Tour for a quick overview of the components provided by CAMeL Tools.
You can find the full online documentation here for both the command-line tools and the Python API.
Alternatively, you can build your own local copy of the documentation as follows:
# Install dependencies
pip install sphinx myst-parser sphinx-rtd-theme
# Go to docs subdirectory
cd docs
# Build HTML docs
make htmlThis should compile all the HTML documentation in to docs/build/html.
If you find CAMeL Tools useful in your research, please cite our paper:
@inproceedings{obeid-etal-2020-camel,
   title = "{CAM}e{L} Tools: An Open Source Python Toolkit for {A}rabic Natural Language Processing",
   author = "Obeid, Ossama  and
      Zalmout, Nasser  and
      Khalifa, Salam  and
      Taji, Dima  and
      Oudah, Mai  and
      Alhafni, Bashar  and
      Inoue, Go  and
      Eryani, Fadhl  and
      Erdmann, Alexander  and
      Habash, Nizar",
   booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
   month = may,
   year = "2020",
   address = "Marseille, France",
   publisher = "European Language Resources Association",
   url = "https://www.aclweb.org/anthology/2020.lrec-1.868",
   pages = "7022--7032",
   abstract = "We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.",
   language = "English",
   ISBN = "979-10-95546-34-4",
}CAMeL Tools is available under the MIT license. See the LICENSE file for more info.
If you would like to contribute to CAMeL Tools, please read the CONTRIBUTE.rst file.