We have a set of jobs that run, one for each hour, that encrypts a file and uploads it to an FTP server. They are identical other than the filename for each hour changes, and they are triggered by a file event trigger that runs the particular job for the file generated for that hour.
They have been running fine, until we decided to change from a file copy task for the upload to an FTP task. I made the change by cloning (not copying) an FTP task from one job to the other; modifying the filename; then inactivating the file copy task, and later deleting it. No changes were made to the file trigger or to any other task.
Now, between the file event trigger, and the encrypt task, there is something odd occurring that did not happen before.
On some runs of the job, I see this message, as if the file event trigger was triggered twice by the file:
"Job was not started because it was already running (JobName)". When this message is present, the job runs normally with no errors - however, looking at recent logs prior to the change, this was not happening before the change to these jobs.
However, some runs do not contain the above message - and what happens on those runs is what is the problem. It looks like the encrypt task (first task in the job) tries to run twice:
1/15/2009 12:45:50 PM Debug Task (63817) was added to processlist: Encrypt 11 file
1/15/2009 12:45:50 PM Debug Task (63818) was added to processlist: Encrypt 11 file
1/15/2009 12:45:50 PM Info Task started: Encrypt 11 file
1/15/2009 12:45:50 PM Info Task started: Encrypt 11 file
1/15/2009 12:46:04 PM Err Unhandled exception in Task: The process cannot access the file '\\servername\m$\foldername\filename.txt' because it is being used by another process.
1/15/2009 12:46:04 PM Err Encryption failed on file path: \\\servername\m$\foldername\filename.txt - encryption aborted
1/15/2009 12:46:04 PM Info Task completed: Encrypt 11 file
1/15/2009 12:46:04 PM Debug Setting previous task in TaskProcessCompleted: Encrypt 11 file (da90c225-ec24-452c-826f-0a2d967c2405) in job: JobName
1/15/2009 12:46:04 PM Debug Process status - About to SendTaskProcess (63818)
1/15/2009 12:46:04 PM Debug Process status - About to RemoveTaskProcess (63818)
1/15/2009 12:46:04 PM Debug Task (63818) was removed from processlist: Encrypt 11 file
1/15/2009 12:46:04 PM Debug Running Email Notification: 'Encryption Failure'
1/15/2009 12:46:04 PM Info Task completed: Encrypt 11 file
1/15/2009 12:46:04 PM Debug Setting previous task in TaskProcessCompleted: Encrypt 11 file (da90c225-ec24-452c-826f-0a2d967c2405) in job: JobName
1/15/2009 12:46:04 PM Debug Process status - About to SendTaskProcess (63817)
1/15/2009 12:46:04 PM Debug Process status - About to RemoveTaskProcess (63817)
1/15/2009 12:46:04 PM Debug Task (63817) was removed from processlist: Encrypt 11 file
When this happens, the first encrypt task runs successfully, and the job completes and does everything that it should do. However, we get alerted because of the error from the second encryption task and have to investigate each instance. (The log has this path in it: \\\servername\m$\foldername\filename.txt - I'm not sure where it is picking up the third slash, but that is not in the task configuration.)
Looking at the file trigger in the log, something different happens now than before.
Prior to the change, this is the sequence for the file trigger, when it worked normally:
1) objectchanged
2) enter lock
3) before counter check
4) in counter check
5) result object created
6) after file release
7) in fire trigger
8) before sending result
9) after sending result
10) exit lock
Now something different happens in the instance where the "Job was not started because it was already running (JobName)" happens; and something else when the encryption task tries to run twice:
This is when the encryption task tries to run twice:
1) objectchanged
2) enter lock
3) before countercheck
4) in countercheck
5) result object created
6) objectchanged <-
7) enter lock
8) before counter check
9) in counter check
10) result object created
11) after file release
12) in fire trigger
13) after file release
14) in fire trigger
15) before sending result
16) after sending result
17) exit lock
18) before sending result
19) after sending result
20) exit lock
21) after sending result
This is when the "Job was not started because it was already running (JobName)" log message occurs:
1) objectchanged
2) enter lock
3) before countercheck
4) in countercheck
5) result object created
6) objectchanged <-
7) enter lock
8) before counter check
9) in counter check
10) result object created
11) after file release
12) in fire trigger
13) before sending result <-
14) after sending result
15) exit lock
16) after file release
17) in fire trigger
18) before sending result
19) after sending result
20) exit lock
In other words, since the change, it appears that every one of these jobs is triggered twice - however, sometimes the trigger works one way, leading to the encryption task to run twice, causing an error alert for us - as if VC did not successfully suppress the 2nd job run; and other times the trigger works differently, and logs the "Job was not started because it was already running (JobName)" message, but we have no apparent problems with the job run.
Edited by user
2009-01-16T08:22:17Z
|
Reason: Not specified