Improving Azure Functions Blob Trigger Performance and Reliability - Part 1: Memory Usage

This is the first part of a series or articles.

When creating blob-triggered Azure Functions there are some memory usage considerations to bear in mind.

“The consumption plan limits a function app on one virtual machine (VM) to 1.5 GB of memory. Memory is used by each concurrently executing function instance and by the Functions runtime itself.” [Microsoft]

A blob-triggered function can execute concurrently and internally uses a queue: “the maximum number of concurrent function invocations is controlled by the queues configuration in host.json. The default settings limit concurrency to 24 invocations. This limit applies separately to each function that uses a blob trigger.” [Microsoft]

So, if you have 1 blob-triggered function in a Function App, with the default concurrency setting of 24, you could have a maximum of 24 (1 * 24) concurrently executing function invocations. (The documentation describes this as per-VM concurrency, with 2 VMs you could have 48 (2vm * 1 * 24 concurrently executing function invocations.)

If you had 3 blob-triggered functions in a Function App (assuming 1 VM) then you could have 72 (3 * 24) concurrently executing function invocations.

Because the consumption plan “limits a function app on one virtual machine (VM) to 1.5 GB of memory”, if you are processing blobs that are non-trivial in size then you may need to consider overall memory usage.

OutOfMemoryException When Using Azure Functions Blob Trigger

As an example, suppose the following function exists:

public static class BlobPerformanceAndReliability
{
    [FunctionName("BlobPerformanceAndReliability")]
    public static void Run(
        [BlobTrigger("big-blobs/{name}")]string blob, 
        string name, 
        [Blob("big-blobs-out")] out string foundData,
        ILogger log)
    {
        log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {blob.Length} Bytes");

        // Code to find and output a specific line
        foundData = "This line will never be reached if out of memory";
    }
}

The preceding function code is triggered by blobs in the big-blobs container, the omitted code towards the end of the function would find a specific line of text in the blob and output it to big-blobs-out.

We can create a large file (appx. 1.8 GB) with the following code in a console app:

using System.IO;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var sw = new StreamWriter(@"c:\temp\bigblob.txt"))
            {
                for (int i = 0; i < 40_000_000; i++)
                {
                    sw.WriteLine("Some line we are not interested in processing");
                }
                sw.WriteLine("Data: 42");
            }
        }
    }
}

The contents of the last line in the file will be set to “Data: 42”.

If we run the function app locally and upload this big file to the Azure Storage Emulator, the function will trigger and will error with: “System.Private.CoreLib: Exception while executing function: BlobPerformanceAndReliability. Microsoft.Azure.WebJobs.Host: One or more errors occurred. (Exception binding parameter 'blob') (Exception binding parameter 'name'). Exception binding parameter 'blob'. System.Private.CoreLib: Exception of type 'System.OutOfMemoryException' was thrown.”.

The reason for this is that when you bind a blob trigger/input and bind to string or byte[] the entire blob will be read into memory, if the blob is too big (and/or there are other function invocations executing concurrently also processing big files) it will exceed the memory restrictions of the Functions Runtime.

Processing Large Blobs with Azure Functions

Instead of binding to string or byte[], you can bind to a Stream. This will not load the entire blob into memory and will allow you to instead process it incrementally.

The function can be re-written as follows:

public static class BlobPerformanceAndReliability
{
    [FunctionName("BlobPerformanceAndReliability")]
    public static void Run(
        [BlobTrigger("big-blobs/{name}")]Stream blob,
        string name,
        [Blob("big-blobs-out/{name}")] out string foundData,
        ILogger log)
    {
        log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {blob.Length} Bytes");

        // Code to find and output a specific line            

        foundData = null; // Don't write an output blob by default

        string line;

        using (var sr = new StreamReader(blob))
        {                
            while (!sr.EndOfStream)
            {
                line = sr.ReadLine();

                if (line.StartsWith("Data"))
                {
                    foundData = line;
                    break;
                }                    
            }
        }            
    }
}

If you’re not familiar with using streams in .NET, check out my Working with Files and Streams in C# Pluralsight course.

If we force the same blob to be reprocessed with this new function code, there will be no error and the output blob containing “Data: 42” will be seen in the big-blobs-out container.

Another thing to bear in mind when processing large files is that there is a timeout on function execution.

In the next part of this series we’ll look at how to improve the responsiveness of function execution when new blobs are written and also improve the reliability and reduce the chances of blobs being missed.

SHARE:

Add comment

Loading