D3D11 Compute Shader - Part 1

GPU has become a general purpose processor! or at least becoming more and more general. This is proved by the existence of GPGPU APIs such as DirectCompute, CUDA, OpenCL. It's time to start learning Compute Shader (CS), in this case, DirectCompute from D3D11.

Past GPGPU Coders...
Believe it or not GPGPU actually has existed before Compute Shaders arrived. However, you need to structure everything in terms of graphics, i.e. in order to launch GPGPU computation you have to render geometry and you basically use Pixel Shaders to do the computation.

While this style of GPGPU coding can still work today, we can do much better! Compute Shaders allow us to use GPU just like we program a regular code. The first benefit is that you don't need to care about graphics pipeline and such, you just need to dispatch your Compute Shaders and that's it. In addition, Compute Shaders bypass graphics pipeline, i.e. primitive assembly, rasterization, etc2; so you have the potential to run faster than running GPGPU with Pixel Shaders.. or at least in theory.

Setting Up Simple Framework
In order to start learning Compute Shaders, we need a framework, a simple one that allow us to focus on doing Compute Shaders and learn the performance characteristics. A good place to start is BasicCompute11 from DirectX SDK.

I'd start from that sample. However, we need a little bit more. We need to upgrade to VS2012+ so that we can potentially use VSGD (Visual Studio Graphics Debugger) to profile our application. In addition, since we want to learn the performance characteristic of Compute Shaders, we need to be able to time it. There are couple references on how to do this:
  1. Nathan Reed: GPU Profiling 101 - http://www.reedbeta.com/blog/2011/10/12/gpu-profiling-101/
  2. MJP: Profiling in DX11 with Queries - http://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/
  3. OpenVIDIA: Events: Basic Profiling and Synchronization - http://openvidia.sourceforge.net/index.php/DirectCompute#Events:_Basic_Profiling_.26_Synchronization
I prefer doing it via D3D11 queries, specifically D3D11_QUERY_TIMESTAMP_DISJOINT and D3D11_QUERY_TIMESTAMP. However, don't forget to wait for the data to be available when calculating the elapsed time of compute shader. Basically, here's how I profile the compute shaders:
void RunComputeShader(...)
{
    pContext->Begin(pQueryDisjoint);

    // Do some CS init, i.e. setting shader, resource, constant buffer

    pContext->End(pQueryBeginCS);
    pContext->Dispatch( x, y, z);
    pContext->End(pQueryEndCS);

    // Do some CS unit

    pContext->End(pQueryDisjoint);


    //********************************************************************
    // Collect time stamps
    //********************************************************************

    // Wait for data to become available
    D3D11_QUERY_DATA_TIMESTAMP_DISJOINT tsDisjoint;
    while (pContext->GetData(g_pQueryDisjoint, &tsDisjoint, sizeof(tsDisjoint), 0) == S_FALSE) {}
    if (tsDisjoint.Disjoint)
        return;

    UINT64 beginCSTimeStamp;
    UINT64 endCSTimeStamp;

    while (pContext->GetData(g_pQueryBeginCS, &beginCSTimeStamp, sizeof(UINT64), 0) == S_FALSE) {}
    while (pContext->GetData(g_pQueryEndCS, &endCSTimeStamp, sizeof(UINT64), 0) == S_FALSE) {}

    // Convert to real time
    float computeShaderElapsed = float(endCSTimeStamp - beginCSTimeStamp) / float(tsDisjoint.Frequency) * 1000.0f;
    printf("Compute shader done in %f ms\n", computeShaderElapsed);
}
For completeness, here's how I create and destroy the queries:
    // create
    D3D11_QUERY_DESC queryDisjointDesc;
    queryDisjointDesc.Query     = D3D11_QUERY_TIMESTAMP_DISJOINT;
    queryDisjointDesc.MiscFlags = 0;

    if (FAILED(g_pDevice->CreateQuery(&queryDisjointDesc, &g_pQueryDisjoint)))
    {
        printf("Could not create timestamp disjoint query!");
        exit(-1);
    }


    D3D11_QUERY_DESC queryDesc;
    queryDesc.Query     = D3D11_QUERY_TIMESTAMP;
    queryDesc.MiscFlags = 0;

    if (FAILED(g_pDevice->CreateQuery(&queryDesc, &g_pQueryBeginCS)))
    {
        printf("Could not create start-frame timestamp query");
        exit(-1);
    }

    if (FAILED(g_pDevice->CreateQuery(&queryDesc, &g_pQueryEndCS)))
    {
        printf("Could not create start-frame timestamp query");
        exit(-1);
    }
    // destroy
    SAFE_RELEASE( g_pQueryDisjoint );
    SAFE_RELEASE( g_pQueryBeginCS );
    SAFE_RELEASE( g_pQueryEndCS );    

That will allow us to start plunging into the world of Compute Shaders!

Comments

Popular posts from this blog

GDC 2015 Links

Mapping Square Texture to Trapezoid / Quadrilateral