{COLLECTOR DESCRIPTION}
- Scroll to the top of my repository and click on the "clone or download button"
- Decide whether you want to clone the project using HTTPS or an SSH key and do the following:
- HTTPS: click on the checklist icon to the right of the URL
- SSH key: first click on 'Use SSH' then click on the same icon as above
- Open the 'Terminal'
- Change the current working directory to the location where you want the cloned directory
- Type 'git clone', and then paste the URL you copied earlier.
- Press 'Enter' to create your local clone.
You can find both the source of this information and learn more about the process on the following link: Cloning a Repository
- You need Python installed on your machine.
python -m venv venvsource venv/bin/activateActivate virtual environment (for Unix/Linux)venv\Scripts\activateActivate virtual environment (for Windows)pip3 install -r requirements.txtInstall dependencies
pip3 install -r requirements.txtInstall dependenciespip3 install ./libs/moonshotInstall submodules- Install dependencies:
pip3 install -r requirements.txt - Install submodules:
- Common lib:
pip3 install ./libs/moonshot
- Does it need an account?
- Which are the mandatory parameters?
entrypoint.sh:- Uncomment the line:
export HOUSTON_PROFILE="dev" - Comment the command line arguments used in production. e.g.
EXECUTION_ARGS=(--videosUrl "$VIDEO_URLS" --startDate "$START_DATE" --endDate "$END_DATE" --output-queue --input-queue --save-state) # Prod - Uncomment the command line arguments used in dev environment and fill it with your arguments. e.g.
EXECUTION_ARGS=(--campaign 12345 --videosUrl "https://www.tiktok.com/@classicobeachcluboficial/video/7490754740565970182" --startDate "2025-01-23" --endDate "2025-01-23") # TEST
- Uncomment the line:
main.py:- Uncomment the line:
# mq_handler = None # Test - Comment the line:
mq_handler = outpututil.create_outputhandler_from_args(args=raw_args, campaign_id=campaign_id) # Prodhelpers.py:- In the method
HelperMethods::output_data:- Comment the line:
mq_handler.write_data(object_data) - Uncomment the line:
pprint(object_data, indent=4)
- Comment the line:
- In the method
- Uncomment the line:
collector.py:- Comment the metric publishers. e.g.
self.mq_handler.write_metric(name=f"comments_video_id_{video.id}",unit=APPLICATION_NAME.upper(),int_value=int(total_collected_posts)) # Prod
- Comment the metric publishers. e.g.
- Run the
entrypoint.sh:sh entrypoint.sh
- What are the collector capabilities?
- Can it collect data from single posts?
- Can it collect data from accounts?
- Can it collect data from hashtags?
- What are the collector arguments coming from Houston?
- Are there multiple collection modes? Like collect from post URLs or from user IDs. What is the description of each mode?
- Streaming mode? What are the particularities of the streaming mode for this collector?
- What are the output fields sent to S3? What is the description of each field?
- Testing frameworks:
pytestandunittest
coverage run -m pytest -v
coverage report
coverage html
- windows:
cmd /c start "" htmlcov/index.html - Linux/macOS:
open htmlcov/index.html
- What are the known issues?
- What are the next steps?