Skip to content

Logging Not Full Downloaded When Log is Large #36

@engmtcdrm

Description

@engmtcdrm

If you have a piece of code that takes less time to run than to download the log, the SASStudioOperator will not finish downloading the log. This is due to how the method _run_job_and_wait works. It will continue to loop until the status of the job changes out of running or pending (let's ignore the unknown logic for a moment). While it is looping it will call the method stream_log which will download the log for Airflow. However, if the job completes before the log can be downloaded, the log will be incomplete.

The SAS code to run to replicate this is rather simple, it spits out a ton of NOTEs to the log intentionally trying to see how far the log can be pushed before breaking. If the uncommented 500000 does not work to see the issue, just change it to a higher number and try again.

%macro test_log_lines();
    /* %do i = 1 %to 250000; */
    %do i = 1 %to 500000;
    /* %do i = 1 %to 1000000; */
    /* %do i = 1 %to 2000000; */
    /* %do i=1 %to 40000000; */
    /* %do i=1 %to 80000000; */
        %put NOTE: hi mom &i.!;

        /* data _null_;
            sleep(1);
        run; */
    %end;
%mend test_log_lines;

%test_log_lines;

The solution to this is to have a second check after the while loop on line 347 to check if the num_log_lines < job['logStatistics']['lineCount']. This logic should be a while loop replace the logic on line 379 because that will only grab the last 99999 lines of the log, not ALL the rest.

Lastly, here is a screen shot of the log from the code I ran above in Airflow UI. You can see how the log abruptly stops.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions