Improving Azure Functions Blob Trigger Performance and Reliability - Part 2: Processing Delays and Missed Blobs

This is the second part of a series or articles.

When you add a new blob, your blob-triggered function may not be triggered immediately: “If the blob container being monitored contains more than 10,000 blobs, the Functions runtime scans log files to watch for new or changed blobs. This process can result in delays. A function might not get triggered until several minutes or longer after the blob is created.” [Microsoft]

Also when scanning log files to find new blobs that need processing, there’s “no guarantee that all events are captured. Under some conditions, logs may be missed.” [Microsoft]

This means that it is possible for some new blobs to be missed and not processed.

Using a Storage Queue to Trigger Processing of New Blobs

One alternative to reduce the likelihood of missed blobs and also improve the responsiveness of blob processing is to use a slightly more  complex (but still relatively straight forward) approach.

Essentially this alternative approach has the following workflow:

  1. New blob written to blob storage
  2. Write message to storage queue containing new blob path
  3. Queue-triggered function gets message from step 2
  4. Blob processing occurs

(Note that this alternative approach may not suit all situations depending on how new blobs are making their way into blob storage – who or whatever is writing the blob in step 1 also needs to be able to write a queue message.)

Blob Writing

This approach requires that when a blob is written, a queue message is also written.

As a simple example, this could be from client code as follows:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System.IO;
using System.Threading.Tasks;

namespace AddNewBlob
{
    class Program
    {
        static async Task Main(string[] args)
        {
            CloudStorageAccount storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
            CloudBlobClient cloudBlobClient = storageAccount.CreateCloudBlobClient();
            CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("food-in");
            CloudBlockBlob cloudBlockBlob = cloudBlobContainer.GetBlockBlobReference("recipe1.txt");

            await WriteBlob();
            await WriteMessage();

            async Task WriteBlob()
            {
                using (var stream = await cloudBlockBlob.OpenWriteAsync())
                using (var sw = new StreamWriter(stream))
                {
                    await sw.WriteLineAsync("carrot");
                    await sw.WriteLineAsync("steak");
                    await sw.WriteLineAsync("apple");
                }
            }

            async Task WriteMessage()
            {
                var queueClient = storageAccount.CreateCloudQueueClient();
                var queue = queueClient.GetQueueReference("food-in");
                await queue.AddMessageAsync(new Microsoft.WindowsAzure.Storage.Queue.CloudQueueMessage("recipe1.txt"));
            }
        }

        
    }
}

Or perhaps the blob data comes in via an HTTP-triggered function as follows:

public static class AddRecipe
{
    [FunctionName("AddRecipe")]
    [return: Queue("food-in")]
    public static async Task<string> Run(
        [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,            
        ILogger log)
    {
        log.LogInformation("C# HTTP trigger function processed a request.");
        
        string ingredients = await new StreamReader(req.Body).ReadToEndAsync();

        // validation/error code omitted for demo purposes

        var blobName = Guid.NewGuid().ToString();

        await WriteBlob(); // ensure blob is written *before* function returns and add message to the queue

        return blobName; // write to queue


        async Task WriteBlob()
        {
            var account = CloudStorageAccount.DevelopmentStorageAccount; // In real app load this from secure config location
            var blobClient = account.CreateCloudBlobClient();
            var blobContainer = blobClient.GetContainerReference("food-in");
            var cloudBlockBlob = blobContainer.GetBlockBlobReference(blobName);
            await cloudBlockBlob.UploadTextAsync(ingredients);
        }
    }
}

Notice in the preceding code , the writing of the blob is being done explicitly in code to ensure that the queue message isn’t added until the blob is definitely available to be processed by the next function in the chain. (See this related GitHub issue).

More Reliable Blob Processing

The next function is where the actual processing of the new blob is carried out, it is however triggered from a queue rather than relying on a blob trigger:

public static class ProcessFoodBlobs
{
    private static readonly string[] _meats = { "steak", "chicken", "venison" };      

    [FunctionName("ProcessFoodBlobs")]
    public static void Run(
        [QueueTrigger("food-in")]string newBlobPath, 
        [Blob("food-in/{queueTrigger}")] string foods,
        [Blob("food-out/{queueTrigger}.vegetarian")] out string vegetarian,
        [Blob("food-out/{queueTrigger}.nonvegetarian")] out string nonVegetarian,
        ILogger log)
    {
        vegetarian = null;
        nonVegetarian = null;

        string[] foodLines = foods.Split(new[] {"\r\n", "\n"  }, StringSplitOptions.RemoveEmptyEntries);


        foreach (var food in foodLines)
        {
            var isMeat = _meats.Contains(food);

            if (isMeat)
            {
                nonVegetarian += food + Environment.NewLine;
            }
            else
            {
                vegetarian += food + Environment.NewLine;
            }
        }    
    }
}

In the preceding code we’re making use of automatic input blob binding.

Summary

This approach may offer some benefits at the cost of some additional complexity if you have a lot of blobs being written/stored/processed. It also has some other considerations to bear in mind such as what happens if the blob is deleted or changed before the message is picked up off the queue? As with all things you should consider your own requirements and ensure you do thorough testing which includes performance/load/stress testing.

SHARE:

Comments (2) -

  • Sebastian

    2/22/2019 7:56:32 AM | Reply

    Would be great to see in a number 3 of the series how to use EventGrid (StorageV2) to write blob create events to a queue which triggers the function. Setup is so easy and can be done in minutes. No need to modify the blob producer  logic.

    • Jason Roberts

      2/22/2019 9:28:48 AM | Reply

      Hi Sebastian - you read my mind!! That was the planned 3rd part Smile

Pingbacks and trackbacks (1)+

Add comment

Loading