pashov.net/code|February 2016

Monthly Archives: February 2016

.NET Coding

Download large files with Web Api as a relay

Published by:

Consider the following scenario: you have a file storage in the cloud (google docs, openstack object storage etc.) and you need to provide your app consumers a way to download the files through your app. The challenges here are mainly memory related – if you get files in memory and then return them in the response, your app will soon be crushed with OutOfmemory Exception. So you need a way to turn your app into a relay which is kind of a pipe through which bytes flow from the file storage to the client without buffering any of the content. I guess you are already thinking streams and you are right.

Change the buffer policy

Web api by default will try to buffer the content that you are trying to download and that needs to be changed. You will need to be inherit from IHostBufferPolicySelector  or inherit from the only implementor of this interface (WebHostBufferPolicySelector) and override its methods :

nbsp;

public class WebHostBufferPolicySelector : IHostBufferPolicySelector
{
    public virtual bool UseBufferedInputStream(object hostContext);
    public virtual bool UseBufferedOutputStream(HttpResponseMessage response);
}

You will also need to override one of the two methods, depending if you want unbuffered content for download or upload. In our case we need UseBufferedOutputStream. A possible implementation would be:

public class NoBufferPolicy : WebHostBufferPolicySelector
    {
        public override bool UseBufferedOutputStream(HttpResponseMessage response)
        {
            if (response.RequestMessage.RequestUri.LocalPath.Contains("download"))
            {
                return false;
            }

            return base.UseBufferedOutputStream(response);
        }
    }

So now we use not buffered content when anybody hits a route that contains download and buffered for any other case. After we have done that we need to register our class in GlobalConfiguration:

 GlobalConfiguration.Configuration.Services.Replace(typeof(IHostBufferPolicySelector), new NoBufferPolicy());

Back to the controller

In the usual case the code in your controller should be something like:

 

var client = new HttpClient();
var responseWithHeadersOnly = await client.GetAsync(requestUrl, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);

Two things are worth mentioning here – the second parameter of the GetAsync method – HttpCompletionOption.ResponseHeadersRead which basically reads only the headers of the response and not the content. The second thing is the ConfigureAwait(false) which has to do with how async and await play in a web app.

Here is the time to check if the response is ok. Once we have the response, we need to get hold of the content stream without putting it in memory:

Stream streamToReadFrom = await responseWithHeadersOnly.Content.ReadAsStreamAsync();

After we get the headers, make sure that everything is ok (no service unavailable, server errors etc.) and have control over the stream it is time to start streaming to the client. We will use a push approach in this case, contrary to the traditional pull. Luckily web api gives us the tools for that – we will use the PushStreamContent object for our response content. What it does is that it reads from a stream and pushes the bytes to the client chunk-wise. It has a couple of overloads but it basically uses a function that takes a Stream, HttpContext and TransportContext as params and may return void or a Task. Let’s see some code:

 var client = new HttpClient();
var streamToReadFrom = await client.GetAsync(requestUrl, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);

                var buffer = new byte[65536];
                var result = Request.CreateResponse();
                int bytesRead = 0;

                result.Content = new PushStreamContent((stream, content, context) =>
                {
                    while ((bytesRead = streamToReadFrom.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        stream.Write(buffer, 0, bytesRead);
                    }

                    stream.Close();
                    streamToReadFrom.Close();
                    client.Displose();
                });

                result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment");
                result.Content.Headers.ContentDisposition.FileName = filename;
                result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                return result;

Every time you go through the while loop, you read bytes from the incoming stream and pass them to the outgoing stream.

The flow here is as follows – a request comes in (1), you read the headers of the requested file from the file storage provider (2), return response from the method (return result; line (3)), file starts to download (stream.Write(buffer, 0, bytesRead); (4)), file is done downloading (stream.Close(); (5)). The whole thing ends when you close the outgoing stream and not when you return from the method.

That should be all to turn your web api in a working and scalable file download relay. You can find the source code HERE under the LargeFilesWebApi folder

.NET

Async/Await done right in the context of a web app

Published by:

Since .NET 4.5 we are provided with a handy way for async programming through the await-async operators. However, i feel that these are widely misunderstood as looking at peer’s code.

First things first – what are async-await good for in a web app?

Making your IO bound or computational intensive methods async, makes your app much more scalable. It releases the thread to do any other work (most often serving requests) instead of waiting for computations and IO read/write. If your server can handle 100 people synchronously, it should be ok with at least double that if you use async methods (rule of thumb – depends on many things).

What is a bad idea to use async-await?

If i make a request and that triggers a very slow job, i will be waiting a heck lot of a time until the job is done and the response returned no matter if i use synchronous or asynchronous methods. The benefit of async methods is not that they return fast (you have to await them at the end of the day) but the fact that they utilize threads until doing slow jobs. So response time should be the same but in the mean time the released thread could have server another request before continuing the first job after the await. So if is about utilizing resources and not returning immediately. If you want to return a result to the client immediately and do a job in the background that is a whole new topic. For example, when you make a request to open stack to provision a virtual machine, you get 201 Created immediately, no matter that it takes 10 minutes to provision it. So that is definitely not async-await pattern but rather background workers, queues etc.

Good practice

If you decide to implement async methods go all the way – don’t mix up synchronous and asynchronous code. This may cause deadlock very often. The code below is a deadlock:

[HttpGet]
[Route("api/files")]
public IHttpActionResult Files()
{
    var resultFromLongComputation = this.TestAsync().Result;
    return Ok(resultFromLongComputation);
}

public async Task<int> TestAsync()
{
    await Task.Delay(2000);
    return 42;
}

So we have an async method where we await a long job. When hitting the await line we return immediately in the Files method where we synchronously start to wait for the result from the TestAsync method (the .Result forces the waiting for a Task to complete in synchronous mode). The thread and the related context hang. When the Delay elapses, the TestAsync method wants to get hold of the previous bound context so that it finishes its work and returns a result. However, that context is being blocked by the Files method (it is in a synchronous wait).

How to prevent it?
  1. One way would be to use
public IHttpActionResult Files()
{
    var resultFromLongComputation = await this.TestAsync();
    return Ok(resultFromLongComputation);
}
  2. Another way is to set the ConfigureAwait to false which prevents the method for waiting for the same context to be available. It will grab a thread pool thread instead.
public async Task TestAsync()
{
    await Task.Delay(2000).ConfigureAwait(false);
    return 42;
}

The deadlock is valid for UI apps also (Win forms and WPF) but not for console apps (async in a console app is not bound to the request context or the ui context but just grabs a thread pool thread). So if you have a testing console that is very confusing.

Some words on the Synchronization Context

The default behavior of awaiting a task is to capture the current Synchronization Context and when the task is finished to try to post the invocation of the continuation delegate (compiler generated code that wraps the stuff after the await) on the same Synchronization Context. Synchronization Context will be null in a console app and will be not null in a UI or Web App. So that’s why all the hassle with the contexts and mixing up synchronous and asynchronous code.

Further Reading:

Stephen Cleary intro to async-await

Async/Await FAQ in the pfxteam team msdn blog