I'll be opening a case with Support, but wanted to post this to see if anyone else has had similar issues. I upgraded to 11.1 when the issue first cropped up and upgraded to 11.1.3 today.
I have 3 Azure servers running VisualCron (1 for each time zone in which we have customers). Two weeks ago, every job started failing at the Execute step on my PDT server (200+ failures over a weekend), so my Azure admin restored the server which seemed to help - the jobs ran successfully for about 48 hours, then they all started failing again. I wound up moving those jobs to my CDT server on April 4, and this week that server started throwing random failures , mostly at the Execute step, but now one is failing at a file copy step. I've confirmed that I don't have any jobs overlapping, have checked Event Viewer and don't find any breadcrumbs to help me get to the root of the problem.
Another interesting twist today - a daily job which has a 19 minute timeout on the job and no timeouts on the tasks, sent a notification that it had timed out after 9 minutes and another notification arrived when it hit 19 minutes. That job had a 9 minute timeout for a while, but was updated to 19 minutes over a year ago.
Anyone have thoughts, tips, tricks, words of wisdom to share?