CHAPTER 25

Camera and perspective math

The map camera model, pitch and bearing, perspective projection matrices, view frustum, and unprojecting pixels to world coordinates.

4 min read

A 3D map positions a virtual camera somewhere above the Earth. This chapter covers how that camera works: pitch, bearing, the matrices that turn world coordinates into screen pixels, and how to reverse them to convert a pixel back to a geographic coordinate.

Camera view frustum · drag to rotate

pitch 40°

Camera pitch: 40°

40°, slight tilt, some depth

The map camera model

A 3D map camera has four parameters:

Parameter	Description	Range
Center	The lat/lon at screen centre	Any valid coordinate
Zoom	Scale level	0–22
Pitch	Tilt away from vertical	0° (top-down) – 85°
Bearing	Rotation from north	0°–360°

At pitch 0, the camera points straight down, a classic 2D slippy map. Increase pitch and the horizon tilts into view, revealing depth. At high pitch values, the near and far clip planes matter: objects at the horizon are extremely far from the camera, creating precision challenges with the depth buffer.

From world to screen

refresher

3D rendering always involves the same pipeline: model → world → camera → clip → screen. For maps, the "model" step is skipped (features are already in world coordinates). The two key matrices are:

View matrix, moves the world so the camera sits at the origin pointing forward
Projection matrix, applies perspective (far things appear smaller)

The transformation chain: world coordinates → camera space → clip space → screen pixels

Step 1: World to camera (view matrix)

The view matrix rotates and translates the world so the camera sits at the origin looking down the -Z axis:

// Simplified: combine bearing and pitch into a 4×4 view matrix
const viewMatrix = mat4.create();
mat4.rotateX(viewMatrix, viewMatrix, pitchRad);
mat4.rotateZ(viewMatrix, viewMatrix, bearingRad);
mat4.translate(viewMatrix, viewMatrix, [0, 0, -cameraDistance]);

Step 2: Camera to clip (projection matrix)

The projection matrix applies perspective: far things shrink. For a standard 36° FOV:

P = \begin{bmatrix} \frac{f}{a} & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \frac{z_f + z_n}{z_n - z_f} & \frac{2 z_f z_n}{z_n - z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}

where $f = 1/\tan(\text{FOV}/2)$ , $a$ = aspect ratio, $z_n$ = near plane, $z_f$ = far plane.

The $f$ term maps the field of view to the clip-space range [-1, 1]. A 36° FOV means objects 18° from the center axis are at the edge of the screen. Wider FOV (larger angle) gives more peripheral vision but more perspective distortion; narrower FOV compresses depth.

Step 3: Clip to screen (viewport transform)

screenX = (clipX / clipW + 1) / 2 * viewportWidth
screenY = (1 - clipY / clipW) / 2 * viewportHeight

View frustum

The view frustum is the pyramid of space visible to the camera: everything outside it gets clipped. It's defined by 6 planes: left, right, top, bottom, near, far.

Understanding the frustum matters for:

Tile culling: only request tiles inside the frustum
Label placement: only show labels for visible features
LOD selection: use higher-detail tiles in the frustum centre

Chapter 25 · Paid content

Continue reading "Camera and perspective math"

You've reached the end of the free preview. Unlock all 22 paid chapters, including distance math, bearings, polygons, spatial indexing, and 3D map rendering, plus a downloadable PDF and the companion code repo.

All 22 paid chapters with worked examples
Downloadable PDF for offline reading
Companion GitHub repo (JavaScript + Python)
Free updates for life

Get full access · $39 Back to chapters

Multiple payment options: card, Wise, and bank transfer.

3D buildings and feature extrusion, what the camera is usually looking at
ECEF and 3D coordinate systems, the underlying 3D world frame
How maps render: tiles, vectors, and the GPU pipeline, where camera math sits in the pipeline

The map camera model

From world to screen

View frustum

Continue reading "Camera and perspective math"

Related chapters