03. Metal Shading Language

A key (and possibly the most critical) part of Metal is writing functions that are executed on the GPU. Apple provides a specialized language for this task, known as Metal Shading Language. In this episode, I'll break down its most important aspects, giving you what you need to start using and understanding it.

Documentation

Let's begin with the documentation, as it's very well written and available here. You'll likely refer to it often, especially for specific details, while the implementation of certain algorithms is a topic for another discussion.

C++14

Although Metal Shading Language (MSL) is based on C++14, it comes with some significant limitations (see section 1.5.4 of the specification):

Not supported: lambda expressions, dynamic_cast, type identification, new and delete, noexcept, goto, register, thread_local storage attributes, virtual functions, derived classes, and exception handling.
Use only Metal's own standard library instead of C++'s.
Pointers: function arguments that are pointers must be qualified with device, constant, threadgroup, or threadgroup_imageblock address attributes.
Avoid naming your Metal function main.

However, MSL does support:

Overloading (except for graphics and kernel functions).
Templates—though often hated, they become incredibly useful for vector math across different types.
Preprocessor Directives—these are straightforward and supported even in C.
Function pointers (introduced in Metal 2.3)—ideal for dynamically selecting methods, such as choosing a blending mode by its identifier.
Recursive calls (in kernels, starting from Metal 2.4)—though useful, exercise caution when using recursion on both the CPU and GPU.

Compilation

The documentation includes a fairly extensive section on compilation parameters. However, in most cases, you can simply rely on Xcode by adding a .metal file (or multiple files for better organization) to your project. All the functions within those files will be accessible from the default library without extra effort.

So, what is a library? Your Metal code is compiled into a library, either static or dynamic (more details here). In your CPU-side code, this library is accessed through an MTLLibrary, from which you create instances of your MTLFunction objects and then bundle them into pipelines.

Functions

MSL includes a variety of function types, and as the language evolves, new ones continue to emerge. Here, I'll cover just a few of the basics to get you started:

Kernels

This is just a computing function that processes many data elements in parallel, making tasks like rendering graphics or running simulations faster by splitting the workload across multiple threads. For example:

// (1)
inline static float rand(float2 pos) {
    return fract(sin(dot(pos, float2(12.9898, 78.233))) * 43758.5453123);
}

kernel                                                      // (2)
void krnRandom(                                             
    texture2d<float, access::write> out [[ texture(0) ]],   // (3)
    constant float &seed [[ buffer(0) ]],                   // (4)
    constant float &threshold [[ buffer(1) ]],              // (5)
    uint2 gid [[thread_position_in_grid]])                  // (6)
{
    uint2 size(out.get_width(), out.get_height());          // (7)
    if (any(gid >= size)) {                                 // (8)
        return;
    }
    float value = rand(float2(gid) + seed);                 // (9)
    out.write(step(threshold, value), gid);                 // (10)
}

This kernel generates a random pattern of white and black pixels:

This method is a common approach for noise generation in shaders.
The kernel or [[kernel]] keyword designates the function as a compute kernel function that you can link with an MTLFunction.
A texture bound at index 0 (with the attribute [[texture(0)]]) is given write access and is associated with the float type. The actual type can vary, but for normalized types like .rgba8unorm, using float values is convenient.
The seed parameter is stored in a buffer bound at index 0 (with the attribute [[buffer(0)]]), accessed as constant. This is mandatory when passing parameters as raw bytes without creating a buffer object, and useful when no writing is required.
Another parameter, threshold, is bound at index 1.
The thread index within the grid is specified by [[thread_position_in_grid]]. This grid can be 1D, 2D, or 3D, depending on what best fits the task.
The kernel creates a vector matching the size of the input texture.
It checks whether the thread index is outside the texture's bounds and exits the kernel if necessary. Although the Apple4 family supports non-uniform thread group dispatching, this check is still needed for simulators.
The kernel generates a random value at the given point.
It writes 0 or 1 to the texture at the corresponding position, depending on the random value.

Graphics Functions

Before diving into the shaders themselves, let's briefly go over the render pipeline at a very high level:

You provide some data or, at a minimum, invoke the vertex shader the required number of times for your geometry.
The vertex shader calculates the positions of vertices in viewport coordinates.
Metal on the GPU assembles primitives (triangles, lines, points) from these vertices and rasterizes them—determining whether a given pixel is inside the geometry or not.
The fragment shader is called for every pixel (or more precisely, fragment, as there can be multiple outputs) to compute the color for these points.

In essence, the vertex shader runs for every vertex in your geometry, while the fragment shader runs for every pixel within the geometry in the viewport.

Let's walk through a simple example (and by the way, if you'd like to experiment with Metal or OpenGL shaders, tools like KodeLife are great for that):

// (1)
typedef struct {
    float4 position [[position]];                           // (2)
    float4 color;                                           // (3)
} ColorInOut;

vertex                                                      // (4)
ColorInOut vshSimpleQuad(
    unsigned int vid [[vertex_id]]                          // (5)
) {
    ColorInOut out;

    constexpr float3 vertices[] = {                         // (6)
        float3(0, 0, 1),
        float3(1, 0, 1),
        float3(0, 1, 1),
        float3(1, 1, 1)
    };

    float3 vCoord = vertices[vid % 4];                      // (7)
    vCoord.xy = vCoord.xy * 2 - 1;                          // (8)
    out.color = float4(vertices[vid % 4], 1);               // (9)
    out.position = float4(vCoord, 1.0);                     // (10)

    return out;
}

fragment                                                    // (11)
float4 fshSimpleQuad(
    ColorInOut in [[stage_in]]                              // (12)
) {
    return in.color;                                        // (13)
}

This is a structure we set up in the vertex shader and pass to the fragment shader. The values in this structure will be interpolated between vertices.
The position of a vertex in viewport coordinates is specified with the [[position]] attribute.
A color value is assigned to the vertex, but it will be interpolated across the surface when accessed in the fragment shader (which is very useful).
The vertex or [[vertex]] keyword marks the function as a vertex shader.
Although vertex shaders can take many parameters. For this simple example, just using a vertex index is sufficient, so no additional data needs to be passed.
Instead of using buffers, we directly store the vertex values within the shader, which is convenient if the vertices are simple.
We then access the value for the current vertex. In this example, it's assumed that we're working with a triangle strip primitive.
Since viewport coordinates are in the range [-1, 1], we need to remap the values from [0, 1], or else the geometry would only cover a quarter of the viewport.
The color for the vertex is assigned here.
The position of the vertex is set at this point.
The fragment or [[fragment]] keyword indicates that the function is a fragment shader.
The input value (attribute [[stage_in]]) from our structure is read for this specific fragment (already interpolated).
Finally, the color of the pixel is set based on the input from the structure.

As a result of these shaders, you will see the following output: (description of what the output looks like would go here).

Other functions

There're much more supported types of functions. For example:

[[visible]] - for accessing functions from outside from the Metal file. Since Metal 2.3
[[stitchable]] - for using the function in Metal Framework Function Stitching API. It becomes visible. For SwiftUI effect these functions are used. Since Metal 2.4
[[intersection]] - for ray-tracing pipelines, it computes "behaviour" of objects when they are hit by rays. Since Metal 2.3
[[object]] - for ray-tracing pipeline, it does computations in local space of an object. Since Metal 3
[[mesh]] - for generating geometry procedurally on GPU for passing to vertex shader. Since Metal 3

Attributes

We've already mentioned a few attributes—those enclosed in [[]]. These are special keywords that give the Metal compiler extra instructions on how to handle specific functions, variables, or parameters during shader execution. There are quite a few of them, and it's impossible to cover them all here, so be sure to refer to the documentation for a comprehensive list.

Types

Metal supports scalar, vector, matrix primitives, textures, buffers, samplers, etc.

Primitives

Scalars are represented by signed and unsegned integers from 8 to 64 bits (very slow on GPU - minimize using arithmetic with integers), and 16- and 32-bit floats.

Vectors could be of the same types as scalars and contain 2, 3 or 4 elements. Good thing about vectors that you can access their elements in very "random" way, i.e. vecValue.xx, vecValue.wz, etc. Same about initialisation:

float2 a = 5;               // (5.0, 5.0)
float4 b = float4(2, a, 4); // (2.0, 5.0, 5.0, 4.0)
float3 c = b.wxw;           // (4.0, 2.0, 4.0)

Matrices - as same as vectors in terms of types and size MxN where M and N are both in [2;4]. Keep in mind that matrices in Metal and simd are column-first, so if you pass [0, 1, 2, 3] as initializer of a 2x2 matrix, it will be actually

0 2
1 3

Buffers

Buffers are declared in function arguments using the attribute [[ buffer(n) ]] (where n - index for binding) and act as pointers (or references) with a defined address space:

device: Refers to the memory of a buffer object (MTLBuffer) and can be both readable and writable. Textures are always in the device space, so they don't require this attribute.
constant: Refers to read-only memory. You don't need to create an MTLBuffer object to pass a constant argument, but there are some limitations.
thread: This memory is visible only within a single thread.
threadgroup: This memory is visible only within a thread group, but threads in the group can share it.
threadgroup_imageblock (since Metal 2.3): Similar to threadgroup, but specifically for image blocks.
ray_data (since Metal 2.3): Refers to memory accessible only in an intersection function.
object_data: Used to pass a payload to a mesh function.

For example:

kernel void krnParticles(
    constant Quark *in [[ buffer(0) ]],                 // (1)
    device Quark *out [[ buffer(1) ]],                  // (2)

    constant float *relations [[ buffer(2) ]],          // (3)
    constant SimParameters &parameters [[ buffer(3) ]], // (4)

    uint gid [[thread_position_in_grid]]
) {...}

Buffer in constant device memory with array of some structure.
Output buffer in writable device memory.
Parameter buffer in constant device memory with array of floats.
Parameter buffer in constant device memory with structure with parameters.

Keep in mind that a buffer is just a block of memory, so it's your responsibility to ensure the correct types and alignments on both the CPU and GPU sides.

Textures and Samplers

MSL supports 1D, 2D, 3D textures, as well as arrays of these textures. Texture objects have types for processing (which may differ from the actual type) and access methods, such as read, write, sample, and read-write (since Metal 1.2).

To access texture data, you can use methods like .read(), .write(), and .sample(). Sampling a texture requires passing a sampler object, which comes with a variety of parameters:

Coordinates (coord::): You can choose between normalized or pixel coordinates, depending on what is most convenient for your task.
Addressing out-of-bounds (address::): Control how texture coordinates that fall outside the valid range are handled with options such as repeat, mirrored_repeat, clamp_to_edge (default), clamp_to_zero, or clamp_to_border.

You can also specify different addressing modes for individual texture coordinates, using s_address, t_address, or r_address.
Border color (border_color::): You can set the border_color to transparent_black (default), opaque_black, or opaque_white.
Filter mode (filter::): Choose between nearest (default) or linear filtering, and you can apply different filters for magnification (mag_filter) and minification (min_filter).
Mip filter (mip_filter::): Options include none (default), nearest, or linear filtering for mipmaps.

When sampling, keep in mind the coordinate scaling rules:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
  0   |   1   |   2   |   3     
      0       |       1

newCoord = (coord + 0.5) * scale - 0.5

This formula adjusts the coordinates for proper texture sampling alignment.

Other Types

There are several other specialized types used for ray tracing, mesh shaders, atomics, and more. I'll dive into these in detail in future episodes, as they are quite complex and beyond the scope of this introduction.

Standard Library and Operators

MSL supports most of the operators available in C++14, and it also handles vector and matrix arithmetic efficiently.

Metal includes a robust standard library for GPU computing, which you can access just using <metal_stdlib>. It's worth noting that some standard functions combine multiple operations in an optimized way. Below is a quick overview of what's available; for detailed descriptions, please refer to the documentation.

<metal_stdlib>: Common utility functions like clamp, mix, etc.
<metal_integer>: Integer operations.
<metal_relational>: Comparison and selection operations.
<metal_math>: Mathematical functions and constants.
<metal_matrix>: Matrix operations like determinant and transpose.
<metal_simdgroup_matrix>: Operations on SIMD matrices.
<metal_geometric>: Geometric functions like distance, refraction, etc.
<metal_compute>: Threadgroup synchronization.
<metal_simdgroup>: SIMD group operations and structures.
<metal_graphics>: Functions for discarding, coordinate derivatives in fragments, and sample counting.
<metal_interpolate>: Explicit interpolation operations in fragments.
<metal_texture>: A wide range of texture operations.
<metal_pack>: Functions for packing/unpacking and type conversions.
<metal_raytracing>: Ray tracing functions.

There are many more libraries available—refer to the documentation for your specific use case, or explore it to discover the full range of functions.

Compatibility with GLSL

In most cases, migrating your OpenGL shaders to Metal is fairly straightforward. However, migrating Metal shaders back to OpenGL can present some challenges. Here's a brief comparison between OpenGL Shading Language (GLSL) and Metal Shading Language (MSL) in case if you're switching from OpenGL:

Feature	GLSL	MSL
Language Base	Based on C	Based on C++14
Platform Support	Cross-platform (Windows, Linux, macOS)	Apple platforms (iOS, macOS)
Memory Model	Implicit memory management	Explicit memory management with address spaces
Shader Types	Vertex, Fragment, Geometry, Tessellation, Compute	Vertex, Fragment, Compute, Mesh, Ray-tracing
Vector/Matrix Arithmetic	Supported	Supported, with more advanced template usage
Function Overloading	Limited	More flexible, supports overloading (except for some specific cases)
Pointers and References	Not supported	Supported with address qualifiers
Function Pointers	Not supported	Supported (since Metal 2.3)
Recursion	Not supported	Supported in kernels (since Metal 2.4)
Built-in Libraries	Less optimized	Extensive optimized standard library for GPU computing
Parallelism Features	Available but less explicit	Explicit support for threadgroups, SIMD, and thread synchronization
API Integration	Tied to OpenGL	Tied to Metal API
Entry Point	`main` function	Explicit function name

Conclusion

We covered the basics of Metal Shading Language (MSL), including key functions, attributes, and how the render pipeline operates.
Topics like ray tracing, mesh shaders, and advanced types were not covered in depth but will be addressed in future discussions.
Tools like KodeLife are excellent for experimenting with Metal Shading Language and shader programming in general.
MSL is evolving, with many new features and optimizations - stay tuned for more in upcoming episodes.