Skip to content

Conversation

@lavaraja
Copy link

@lavaraja lavaraja commented Mar 9, 2025

Issue #, if available: : #83

Description of changes:

In some cases, excessive logging is contributing to CloudWatch logging costs. This change allows users to control the logging verbosity, potentially reducing costs while maintaining the ability to increase verbosity for debugging when needed.

Changes :

  • Added a new function 'configure_logging()' to dynamically set log levels
  • Utilize TS_LOG_LEVEL environment variable to control logging verbosity
  • Supported log levels: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE. Re-mapped the TS_LOG_LEVEL values to corresponding integer values as follows.
 log_levels = {
         '0': 'off',
         '10': 'fatal',
         '20': 'error',
         '30': 'warn',
         '40': 'info',
         '50': 'debug',
         '60': 'trace'
     }
  • Modify log4j2.xml file using sed command based on TS_LOG_LEVEL
  • Handle potential errors during log configuration gracefully
  • Call configure_logging() before starting TorchServe
  • Aim to reduce excessive logging and associated CloudWatch costs
  • Maintain default logging if TS_LOG_LEVEL is not set or invalid

Tests:

  • I've added unit tests in test_log_config.py to cover various scenarios including valid log levels, invalid log levels, and error conditions. All tests are passing.
% python -m unittest discover -v                                               
test_invalid_log_level (test_log_config.TestLogConfig.test_invalid_log_level) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_log4j2_file_not_found (test_log_config.TestLogConfig.test_log4j2_file_not_found) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_no_log_level_set (test_log_config.TestLogConfig.test_no_log_level_set) ... ok
test_subprocess_error (test_log_config.TestLogConfig.test_subprocess_error) ... Current script path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/serving.py
log4j2.xml path: /Users/xxxxxxx/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j2.xml
ok
test_valid_log_level (test_log_config.TestLogConfig.test_valid_log_level) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.002s

OK

Steps to test on Pytorch container :

  • Extend existing Pytorch container.
from 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310
RUN pip install git+https://github.com/lavaraja/sagemaker-pytorch-inference-toolkit.git 
  • Build the image.
    docker build .
  • Tag the image.
    docker tag <image_id> 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310-extended
  • Use above image and pass TS_LOG_LEVEL as environment variable in the model class. Used this sample example for testing locally.
  • modify pytorch_script_mode_local_model_inference.py and use custom built container as image_uri
    model = PyTorchModel(
        role=role,
        model_data=model_dir,
       # framework_version='2.1',
       # py_version='py310',
        image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1-cpu-py310-extended",
        entry_point='inference.py',
        env={'TS_LOG_LEVEL': log_level}
    )
  • Run python pytorch_script_mode_local_model_inference.py to start the container locally and run the inference.
  • Observe the TS_LOG_LEVEL in effect during the torch serve start process. The same logs will be emitted to customer cloud watch logs when deployed on Sagemaker.

test_output_with_diff_loglevels.log

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

lavaraja added 9 commits March 2, 2025 16:47
- Added a new function 'configure_logging()' to dynamically set log levels
- Utilize TS_LOGLEVEL environment variable to control logging verbosity
- Support log levels: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE
- Modify log4j2.xml file using sed command based on TS_LOGLEVEL
- Handle potential errors during log configuration gracefully
- Call configure_logging() before starting TorchServe
- Aim to reduce excessive logging and associated CloudWatch costs
- Maintain default logging if TS_LOGLEVEL is not set or invalid
Added missing import.
updating the log4j2 path.
using absolute path to prevent file not found errors.
@lavaraja lavaraja changed the title Add functionality to configure TorchServe logging levels using the TS_LOGLEVEL environment variable. Add functionality to configure TorchServe logging levels using the TS_LOG_LEVEL environment variable. Mar 9, 2025
@lavaraja
Copy link
Author

lavaraja commented Mar 9, 2025

Updated variable name from TS_LOGLEVEL to TS_LOG_LEVEL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant