The CRT:
The RGB color space corresponds in the most direct and natural way to the manner in which colors are displayed on a cathode-ray tube (CRT) display, such as your TV or your computer monitor. Part of the surface of the tube is covered with phosphors of three type: red, green, and blue. These phosphors emit red, green, or blue light for a short while when hit by a beam of electrons. The electrons are emitted from three electron guns inside the tube. A shadow mask controls which phosphors are hit by which beam. Because the phosphors only light up for a brief period of time, the screen must be refreshed (about 60 times or more each second) in order to maintain a steady image. The image on the CRT is created by the three electron beams, scanning the screen line after line. The lines can be scanned in either an interlaced or non-interlaced order.
Summary: a graphics application ultimately generates pixel values, written into the frame buffer. The frame buffer physically resides in the video memory on the graphics adapter. On this adapter there is some hardware (RAMDAC) that scans the digital values stored in the video memory and converts them to three separate analog signals (R, G, and B). These signals travel along the video cable to the CRT monitor, where they drive the three electron guns.
The geometric part in the imaging process with a pinhole camera is modelled by a transformation from 3D to 2D, which we call the perspective projection. The mathematics of this and other transformations in graphics is covered in depth in the computer graphics course.
Establishing what is visible through each pixel in the image is called "hidden-surface removal". There are many different hidden-surface removal algorithms. Several of them are covered in the computer graphics class. Here, we'll briefly explain two: ray tracing and Z-buffer.
Ray Tracing: The idea is simple. If we want to find out what is visible at a particular pixel in the image, we can simply take a ray whose origin is the pinhole of the camera and direct it at the center of the pixel. Then we could intersect the ray with all the surfaces in our model, and find the intersection point that is closest to the origin of the ray.
Z-buffer: Assume that we have a triangulated B-rep model. It turns out that we can very efficiently scan-convert a triangle. That is, we can very quickly generate the coordinates of all the pixels covered by this triangle on the screen. Furthermore, at each (x,y) pixel coordinate covered by the triangle we can generate (with very little extra effort) the depth of the corresponding surface point (depth = orthogonal distance from the image plane). So, in order to get an image of a model with hidden surfaces removed we maintain another buffer alongside the frame buffer. This buffer is called the Z-buffer. The Z-buffer is initialized to some large value. Now we scan-convert the triangles in the model one by one, but for each pixel, before writing it to the frame buffer, we test if its depth is smaller than the one already in the Z-buffer. Only pixels that pass this test are written to the frame buffer (and their depth value is written to the Z-buffer).
In order to assign more realistic colors to the surface points visible at each pixel we attempt to simulate the physical world. This is difficult to do precisely, so we will resort to various shading models, which are simplified models of the physical reality.
Lambertian reflection is the simplest shading model in computer graphics. The brightness of a Lambertian surface is proportional to the cosine of the angle between the surface normal and a vector directed at the light source. The ambient term is often added in order to create some illumination in areas that face away from the light source.
Lambertian reflection model is appropriate for rough matte surface finishes. For shiny surfaces we expect to see specular highlights whose location depends on the position of the observer relative to the surface and the light source. This shortcoming was addressed by the Phong reflection model, and the Blinn-Phong reflection models. Both of these include an exponential term that succeeds in producing nice looking view-dependent highlights.
The Cook-Torrance model is a physically-based model. It produces more accurate approximations than the empirical models, but it is also considerably more complicated, and requires various physical information about the surface and the material. The analysis is based on the assumption that the surface is made out of mirror-like microfacets, whose normals are distributed about the average surface normal according to some distribution function. The analysis takes into account the self-occlusion of the surface by its own microfacets.
Even more complex physically-based models have been derived, however they are rarely used in practice.