Improving Azure Functions Blob Trigger Performance and Reliability - Part 4: Periodically Checking for Unprocessed Blobs

In the this final part of this series we wrap up by briefly discussing some ways to check for blobs that have not been processed correctly.

When using Azure Functions, a timer trigger can be used to automatically periodically execute a function based on a CRON expression. The following code is and example of a timer-triggered function:

public static class CheckBlobs
{
    [FunctionName("CheckBlobs")]
    public static void Run(
        [TimerTrigger("0 */5 * * * *")]TimerInfo myTimer, 
        ILogger log)
    {
        log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");

        // Blob checking logic here
    }
}

There are a number of ways to check for unprocessed blobs depending on the solution you are building, some examples:

  • Check that an output blob exists for every input blob
  • Use a database to keep track of blobs that were uploaded and compare this to actual output blobs
  • If blobs are deleted after they have been processed, check there are no blobs in the container
  • Etc.

Some things to bear in mind if implementing this kind of checking include:

  • How often/when to run the function?
  • How long after a blob is written should you give it to be processed normally?
  • Will running this function interfere with any other processing in the system?
  • What if a blob is due to be processed (e.g. message sitting in a queue but not yet processed)? Could this create false positives or cause duplication of processing?
  • How long does the checking function take to execute? Will it take too long as the number of blobs increases and will the function be terminated by the runtime?
  • How/who do you notify of missed blobs (email, SMS, create ticket in CRM/bug system, etc.)?
  • Do you try to perform auto-retry of processing? Again, could this cause duplication, errors, etc.?

You could also use logging/Application Insights to provide you with information, or write every incoming blob name to a database and update that record when a blob has been processed, this way unprocessed blobs can be found with a simple “IF NOT PROCESSED” query.

SHARE:

Add comment

Loading