A key (and possibly the most critical) part of Metal is writing functions that are executed on the GPU. Apple provides a specialized language for this task, known as Metal Shading Language. In this episode, I'll break down its most important aspects, giving you what you need to start using and understanding it.
Let's begin with the documentation, as it's very well written and available here. You'll likely refer to it often, especially for specific details, while the implementation of certain algorithms is a topic for another discussion.
Although Metal Shading Language (MSL) is based on C++14, it comes with some significant limitations (see section 1.5.4 of the specification):
dynamic_cast, type identification, new and delete, noexcept, goto, register, thread_local storage attributes, virtual functions, derived classes, and exception handling.device, constant, threadgroup, or threadgroup_imageblock address attributes.main.However, MSL does support:
The documentation includes a fairly extensive section on compilation parameters. However, in most cases, you can simply rely on Xcode by adding a .metal file (or multiple files for better organization) to your project. All the functions within those files will be accessible from the default library without extra effort.
So, what is a library? Your Metal code is compiled into a library, either static or dynamic (more details here). In your CPU-side code, this library is accessed through an MTLLibrary, from which you create instances of your MTLFunction objects and then bundle them into pipelines.
MSL includes a variety of function types, and as the language evolves, new ones continue to emerge. Here, I'll cover just a few of the basics to get you started:
This is just a computing function that processes many data elements in parallel, making tasks like rendering graphics or running simulations faster by splitting the workload across multiple threads. For example:
// (1)
inline static float rand(float2 pos) {
return fract(sin(dot(pos, float2(12.9898, 78.233))) * 43758.5453123);
}
kernel // (2)
void krnRandom(
texture2d<float, access::write> out [[ texture(0) ]], // (3)
constant float &seed [[ buffer(0) ]], // (4)
constant float &threshold [[ buffer(1) ]], // (5)
uint2 gid [[thread_position_in_grid]]) // (6)
{
uint2 size(out.get_width(), out.get_height()); // (7)
if (any(gid >= size)) { // (8)
return;
}
float value = rand(float2(gid) + seed); // (9)
out.write(step(threshold, value), gid); // (10)
}
This kernel generates a random pattern of white and black pixels:
kernel or [[kernel]] keyword designates the function as a compute kernel function that you can link with an MTLFunction.0 (with the attribute [[texture(0)]]) is given write access and is associated with the float type. The actual type can vary, but for normalized types like .rgba8unorm, using float values is convenient.seed parameter is stored in a buffer bound at index 0 (with the attribute [[buffer(0)]]), accessed as constant. This is mandatory when passing parameters as raw bytes without creating a buffer object, and useful when no writing is required.threshold, is bound at index 1.[[thread_position_in_grid]]. This grid can be 1D, 2D, or 3D, depending on what best fits the task.Apple4 family supports non-uniform thread group dispatching, this check is still needed for simulators.0 or 1 to the texture at the corresponding position, depending on the random value.Before diving into the shaders themselves, let's briefly go over the render pipeline at a very high level:

In essence, the vertex shader runs for every vertex in your geometry, while the fragment shader runs for every pixel within the geometry in the viewport.
Let's walk through a simple example (and by the way, if you'd like to experiment with Metal or OpenGL shaders, tools like KodeLife are great for that):
// (1)
typedef struct {
float4 position [[position]]; // (2)
float4 color; // (3)
} ColorInOut;
vertex // (4)
ColorInOut vshSimpleQuad(
unsigned int vid [[vertex_id]] // (5)
) {
ColorInOut out;
constexpr float3 vertices[] = { // (6)
float3(0, 0, 1),
float3(1, 0, 1),
float3(0, 1, 1),
float3(1, 1, 1)
};
float3 vCoord = vertices[vid % 4]; // (7)
vCoord.xy = vCoord.xy * 2 - 1; // (8)
out.color = float4(vertices[vid % 4], 1); // (9)
out.position = float4(vCoord, 1.0); // (10)
return out;
}
fragment // (11)
float4 fshSimpleQuad(
ColorInOut in [[stage_in]] // (12)
) {
return in.color; // (13)
}
[[position]] attribute.vertex or [[vertex]] keyword marks the function as a vertex shader.[-1, 1], we need to remap the values from [0, 1], or else the geometry would only cover a quarter of the viewport.fragment or [[fragment]] keyword indicates that the function is a fragment shader.[[stage_in]]) from our structure is read for this specific fragment (already interpolated).As a result of these shaders, you will see the following output: (description of what the output looks like would go here).

There're much more supported types of functions. For example:
[[visible]] - for accessing functions from outside from the Metal file. Since Metal 2.3[[stitchable]] - for using the function in Metal Framework Function Stitching API. It becomes visible. For SwiftUI effect these functions are used. Since Metal 2.4[[intersection]] - for ray-tracing pipelines, it computes "behaviour" of objects when they are hit by rays. Since Metal 2.3[[object]] - for ray-tracing pipeline, it does computations in local space of an object. Since Metal 3[[mesh]] - for generating geometry procedurally on GPU for passing to vertex shader. Since Metal 3We've already mentioned a few attributes—those enclosed in [[]]. These are special keywords that give the Metal compiler extra instructions on how to handle specific functions, variables, or parameters during shader execution. There are quite a few of them, and it's impossible to cover them all here, so be sure to refer to the documentation for a comprehensive list.
Metal supports scalar, vector, matrix primitives, textures, buffers, samplers, etc.
Scalars are represented by signed and unsegned integers from 8 to 64 bits (very slow on GPU - minimize using arithmetic with integers), and 16- and 32-bit floats.
Vectors could be of the same types as scalars and contain 2, 3 or 4 elements. Good thing about vectors that you can access their elements in very "random" way, i.e. vecValue.xx, vecValue.wz, etc. Same about initialisation:
float2 a = 5; // (5.0, 5.0)
float4 b = float4(2, a, 4); // (2.0, 5.0, 5.0, 4.0)
float3 c = b.wxw; // (4.0, 2.0, 4.0)
Matrices - as same as vectors in terms of types and size MxN where M and N are both in [2;4]. Keep in mind that matrices in Metal and simd are column-first, so if you pass [0, 1, 2, 3] as initializer of a 2x2 matrix, it will be actually
0 2
1 3
Buffers are declared in function arguments using the attribute [[ buffer(n) ]] (where n - index for binding) and act as pointers (or references) with a defined address space:
device: Refers to the memory of a buffer object (MTLBuffer) and can be both readable and writable. Textures are always in the device space, so they don't require this attribute.constant: Refers to read-only memory. You don't need to create an MTLBuffer object to pass a constant argument, but there are some limitations.thread: This memory is visible only within a single thread.threadgroup: This memory is visible only within a thread group, but threads in the group can share it.threadgroup_imageblock (since Metal 2.3): Similar to threadgroup, but specifically for image blocks.ray_data (since Metal 2.3): Refers to memory accessible only in an intersection function.object_data: Used to pass a payload to a mesh function.For example:
kernel void krnParticles(
constant Quark *in [[ buffer(0) ]], // (1)
device Quark *out [[ buffer(1) ]], // (2)
constant float *relations [[ buffer(2) ]], // (3)
constant SimParameters ¶meters [[ buffer(3) ]], // (4)
uint gid [[thread_position_in_grid]]
) {...}
floats.Keep in mind that a buffer is just a block of memory, so it's your responsibility to ensure the correct types and alignments on both the CPU and GPU sides.
MSL supports 1D, 2D, 3D textures, as well as arrays of these textures. Texture objects have types for processing (which may differ from the actual type) and access methods, such as read, write, sample, and read-write (since Metal 1.2).
To access texture data, you can use methods like .read(), .write(), and .sample(). Sampling a texture requires passing a sampler object, which comes with a variety of parameters:
coord::): You can choose between normalized or pixel coordinates, depending on what is most convenient for your task.Addressing out-of-bounds (address::): Control how texture coordinates that fall outside the valid range are handled with options such as repeat, mirrored_repeat, clamp_to_edge (default), clamp_to_zero, or clamp_to_border.

You can also specify different addressing modes for individual texture coordinates, using s_address, t_address, or r_address.
Border color (border_color::): You can set the border_color to transparent_black (default), opaque_black, or opaque_white.
Filter mode (filter::): Choose between nearest (default) or linear filtering, and you can apply different filters for magnification (mag_filter) and minification (min_filter).

Mip filter (mip_filter::): Options include none (default), nearest, or linear filtering for mipmaps.
When sampling, keep in mind the coordinate scaling rules:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
0 | 1 | 2 | 3
0 | 1
newCoord = (coord + 0.5) * scale - 0.5
This formula adjusts the coordinates for proper texture sampling alignment.
There are several other specialized types used for ray tracing, mesh shaders, atomics, and more. I'll dive into these in detail in future episodes, as they are quite complex and beyond the scope of this introduction.
MSL supports most of the operators available in C++14, and it also handles vector and matrix arithmetic efficiently.
Metal includes a robust standard library for GPU computing, which you can access just using <metal_stdlib>. It's worth noting that some standard functions combine multiple operations in an optimized way. Below is a quick overview of what's available; for detailed descriptions, please refer to the documentation.
<metal_stdlib>: Common utility functions like clamp, mix, etc.<metal_integer>: Integer operations.<metal_relational>: Comparison and selection operations.<metal_math>: Mathematical functions and constants.<metal_matrix>: Matrix operations like determinant and transpose.<metal_simdgroup_matrix>: Operations on SIMD matrices.<metal_geometric>: Geometric functions like distance, refraction, etc.<metal_compute>: Threadgroup synchronization.<metal_simdgroup>: SIMD group operations and structures.<metal_graphics>: Functions for discarding, coordinate derivatives in fragments, and sample counting.<metal_interpolate>: Explicit interpolation operations in fragments.<metal_texture>: A wide range of texture operations.<metal_pack>: Functions for packing/unpacking and type conversions.<metal_raytracing>: Ray tracing functions.There are many more libraries available—refer to the documentation for your specific use case, or explore it to discover the full range of functions.
In most cases, migrating your OpenGL shaders to Metal is fairly straightforward. However, migrating Metal shaders back to OpenGL can present some challenges. Here's a brief comparison between OpenGL Shading Language (GLSL) and Metal Shading Language (MSL) in case if you're switching from OpenGL:
| Feature | GLSL | MSL |
|---|---|---|
| Language Base | Based on C | Based on C++14 |
| Platform Support | Cross-platform (Windows, Linux, macOS) | Apple platforms (iOS, macOS) |
| Memory Model | Implicit memory management | Explicit memory management with address spaces |
| Shader Types | Vertex, Fragment, Geometry, Tessellation, Compute | Vertex, Fragment, Compute, Mesh, Ray-tracing |
| Vector/Matrix Arithmetic | Supported | Supported, with more advanced template usage |
| Function Overloading | Limited | More flexible, supports overloading (except for some specific cases) |
| Pointers and References | Not supported | Supported with address qualifiers |
| Function Pointers | Not supported | Supported (since Metal 2.3) |
| Recursion | Not supported | Supported in kernels (since Metal 2.4) |
| Built-in Libraries | Less optimized | Extensive optimized standard library for GPU computing |
| Parallelism Features | Available but less explicit | Explicit support for threadgroups, SIMD, and thread synchronization |
| API Integration | Tied to OpenGL | Tied to Metal API |
| Entry Point | main function |
Explicit function name |