-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
bugSomething isn't workingSomething isn't working
Description
This is a migration of whamcloud/lustre-collector#65
Lustre prints s64 in places that lustre-collector parses as u64 thus throwing an error if a negative value is returned.
[root@node1 ~]# lctl get_param \*.*.ldlm_canceld.stats
ldlm.services.ldlm_canceld.stats=
snapshot_time 1690228679.404172099 secs.nsecs
start_time 1690208535.652492047 secs.nsecs
elapsed_time 20143.751680052 secs.nsecs
req_waittime 96 samples [usecs] -20 43536 54561 1897709649
req_qdepth 96 samples [reqs] 0 0 0 0
req_active 96 samples [reqs] 1 2 103 117
req_timeout 96 samples [secs] 15 15 1440 21600
reqbuf_avail 199 samples [bufs] 63 64 12688 809008
ldlm_cancel 96 samples [usecs] 5 235 3891 285769
Jul 24 19:57:39 node1 emf-stats-agent[667612]: INFO emf_stats_agent: Stats collection is enabled
Jul 24 19:57:39 node1 emf-stats-agent[667612]: Error: LustreCollectorError(CombineEasyError(Errors { position: 11397, errors: [Unexpected(Token('-')), Expected(Static("whitespace")), Expected(Static("digit")), Message(Static("While parsing ldlm_canceld.stats"))] }))
Jul 24 19:57:39 node1 systemd[1]: emf-stats-agent.service: Main process exited, code=exited, status=1/FAILURE
This has been seen on a live system:
ldlm.services.ldlm_canceld.stats=
snapshot_time 1714662722.986857642 secs.nsecs
req_waittime 101358239600 samples [usecs] -36 1855805 5530720965329 21563670935443407
req_qdepth 101358239600 samples [reqs] 0 1164 1893537183 7828947261
req_active 101358239600 samples [reqs] 1 23 152657095033 281200017837
req_timeout 101358239600 samples [secs] 1 218 6892805470378 468801155450006
reqbuf_avail 210996467581 samples [bufs] 0 155 13398749465845 850967459690601
ldlm_cancel 101358239600 samples [usecs] 1 211241571 1436530054018 108103493490971286
Related Lustre Ticket: LU-9683
Related Lustre Ticket: LU-17853
Underlying issue is probably the use of ktime_get_real() in ptlrpc which is subject to negative movement due to leap seconds and NTP updates.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working