CHAPTER 25
Camera and perspective math
The map camera model, pitch and bearing, perspective projection matrices, view frustum, and unprojecting pixels to world coordinates.
A 3D map positions a virtual camera somewhere above the Earth. This chapter covers how that camera works — pitch, bearing, the matrices that turn world coordinates into screen pixels, and how to reverse them to convert a pixel back to a geographic coordinate.
Camera view frustum · drag to rotate
pitch 40°
40° — slight tilt, some depth
The map camera model
A 3D map camera has four parameters:
| Parameter | Description | Range |
|---|---|---|
| Center | The lat/lon at screen centre | Any valid coordinate |
| Zoom | Scale level | 0–22 |
| Pitch | Tilt away from vertical | 0° (top-down) – 85° |
| Bearing | Rotation from north | 0°–360° |
At pitch 0, the camera points straight down — a classic 2D slippy map. Increase pitch and the horizon tilts into view, revealing depth. At high pitch values, the near and far clip planes matter — objects at the horizon are extremely far from the camera, creating precision challenges with the depth buffer.
From world to screen
3D rendering always involves the same pipeline: model → world → camera → clip → screen. For maps, the "model" step is skipped (features are already in world coordinates). The two key matrices are:
- View matrix — moves the world so the camera sits at the origin pointing forward
- Projection matrix — applies perspective (far things appear smaller)
The transformation chain: world coordinates → camera space → clip space → screen pixels
Step 1 — World to camera (view matrix)
The view matrix rotates and translates the world so the camera sits at the origin looking down the -Z axis:
// Simplified: combine bearing and pitch into a 4×4 view matrix
const viewMatrix = mat4.create();
mat4.rotateX(viewMatrix, viewMatrix, pitchRad);
mat4.rotateZ(viewMatrix, viewMatrix, bearingRad);
mat4.translate(viewMatrix, viewMatrix, [0, 0, -cameraDistance]);
Step 2 — Camera to clip (projection matrix)
The projection matrix applies perspective — far things shrink. For a standard 36° FOV:
where , = aspect ratio, = near plane, = far plane.
The term maps the field of view to the clip-space range [-1, 1]. A 36° FOV means objects 18° from the center axis are at the edge of the screen. Wider FOV (larger angle) gives more peripheral vision but more perspective distortion; narrower FOV compresses depth.
Step 3 — Clip to screen (viewport transform)
screenX = (clipX / clipW + 1) / 2 * viewportWidth
screenY = (1 - clipY / clipW) / 2 * viewportHeight
View frustum
The view frustum is the pyramid of space visible to the camera — everything outside it gets clipped. It's defined by 6 planes: left, right, top, bottom, near, far.
Understanding the frustum matters for:
- Tile culling: only request tiles inside the frustum
- Label placement: only show labels for visible features
- LOD selection: use higher-detail tiles in the frustum centre
Continue reading "Camera and perspective math"
You've reached the end of the free preview. Unlock all 22 paid chapters, including distance math, bearings, polygons, spatial indexing, and 3D map rendering — plus a downloadable PDF and the companion code repo.
- All 22 paid chapters with worked examples
- Downloadable PDF for offline reading
- Companion GitHub repo (JavaScript + Python)
- Free updates for life
Multiple payment options including Wise, PayPal, and bank transfer.
Related chapters
- 3D buildings and feature extrusion — what the camera is usually looking at
- ECEF and 3D coordinate systems — the underlying 3D world frame
- How maps render — tiles, vectors, and the GPU pipeline — where camera math sits in the pipeline