One of our many tasks in VisualCron is running a Powershell command to check the lockout status of various critical accounts on our network.
This process runs well except when it doesn't. I'll explain:
Almost as if it's scheduled about every 7 days or so this powershell task simply stops returning proper results for a period of a few minutes to an hour.
Our powershell command is very simple:
Get-ADUser {USERVAR(LOCK_username)} -Properties LockedOut | Select-Object LockedOut
The job this task is in has no base trigger but instead the variable above is set by a different job then this job is started after that. We do this because we have multiple accounts we check and we can create 1 task jobs to set the variable and then let the "broken" job run and act on it... it does a number of things like send alerts and mitigate over-alerting and rather than duplicate those tasks we just set it up this way.
The schedules for the "parent" jobs stagger every 30 seconds all day-every day, each checking a different account for lockout and the jobs repeat at the end of a cycle which is about 3 1/2 minutes.
When it's successful it returns standard output with no error output:
LockedOut
- - - - - - - - -
False
The results/format in visualcron match what we receive if we run powershell manually for the same command...
The powershell command returns False if the account is not locked out, True if it is. We check for True and if found we take additional steps in visualcron.
When the job fails it returns standard output:
10:49:31: Server->Execute path: C:\Program Files (x86)\VisualCron\\PowerShell\TaskPowerShell.exe
10:49:31: Server->Executing Task process
10:49:51: Server->Executing Task process exited with exit code: 0
10:49:51: Server->Waiting for completion and result
and the error output is:
Exception in Task: 10:49:51: Server->TaskProcessResult is null
[edit] I should note that we set this task to timeout in 20 seconds which is why there is 20 seconds between executing and the exit code. This is because the job routinely takes just about 5 seconds to run when it's working and needs to be done before the next account checks. We've also increased this time and found that even if we set the timeout to minutes we get the same result [edit]
In support of this error there is another powershell task called when the first powershell task fails or is in error and it too receives the exact same failures during the failure period. It's not limited to this one task...
The second task is:
Unlock-ADAccount -Identity {USERVAR(LOCK_username)}
If left unattended the powershell command simply starts working again in a few minutes to an hour.
Manual testing DURING the problem time (today): Powershell run manually using the same command as the task returns results as expected and in a timely manner (2-4 seconds). Manual testing included accounts used in the jobs as well as other accounts not checked by the jobs. During manual testing the visualcron task continues to fail.
Other Info:
The account used to check account status and unlock accounts has proper rights and is not itself locked out. Other machines on our network can run these commands directly in powershell without error. All other jobs on visualcron continue to operate without error however none of the other jobs use powershell except 1 other job but that job only runs monthly and has never fallen during a period of this jobs failure.
I have created another job to test other NON-AD tasks in powershell to execute when this job is failing but I will be unable to test until the situation occurs again and only then if it is noticed (this failure is time agnostic and sometimes happens in the wee hours of the AM).
Any ideas? Known issues? Suggestions?
:)
Edited by user
2018-03-20T15:43:04Z
|
Reason: Not specified