A typical photogrammetry workflow typically consists of:
Sufficient photographic images are taken of the subject, from a range of angles and distances. Every point to be modelled typically wants to be visible from at least three different angles – two technically allows a match, but three or more matches allows the match to be made with greater confidence (apparent agreement of points in two images might be sheer coincidence ... the more views that agree on the existence of a matching point at a given position, the better). Increasing the number of distinct views of a point also allows its position to be calculated with greater accuracy, by averaging together the results of a larger number of calculations.
The images are best taken in sequence moving slowly around the object. Modern photogrammetry algorithms developed for use with video assume that each image is taken from a very similar position to the previous one, with perhaps two-thirds of the detail from the previous image also present in the current one. These algorithms have turned out to also be efficient for processing a succession of still photographs, as long as the images make up an overlapping positional sequence, as looking for close matches between successive images is much more efficient than checking every image against every other image in a set, including many that may have no details in common.
The images are usually taken with fixed focal length camera settings, or with a series of fixed focal lengths, which are read from the photos' embedded metadata.
parts of some images to be ignored are masked out.
... Tiepoint calculation
Distinctive sets of features (corners, line segments) in multiple images are compared for likely matches, and the software tries to work out how these points might be distributed in three dimensional space to explain where they appear in the images. Some images may be rejected, and some prospective points and features may be rejected if they can;t be reconciled with other more certain matches. If someone walks past a building while you photograph it, the software will hopefully accept the building features and reject the moving person's points, rather than the other way around.
This gives a roughly three-dimensional point-model, which is then refined during subsequent passes, with progressiveley greater confidence. This crude model can be looked at from different angles, to check that enough of the intended model is present, and to mask off or delete points that shouldn't be there.
... Dense cloud creation
Armed with this initial model, and with calculated positions for the camera wrt the model for each shot, the software then compares related images more closely in order to extract further intermediate points.
This stage is usually the most time-consuming, and depending on the amount of detail and the complexity of the model, can take anywhere from hours to days to calculate. The result is a "dense cloud" of coloured points, which can again be edited to remove spurious points.
... Mesh creation
We then normally proceed to the stage of mesh creation. The software looks at adjacent points and experiments with linking them together in groups of three to produce a set of triangular facets that hopefully make up a geometrically consistent surface. Points that can't be easily included in a consistent surface get rejected.
We then have a three-dimensional surface made of triangles, where each triangle can be given a colour based on the colours of its three corner-points. The software can be asked to "fill in" smaller holes with an additional triangular mesh.
Once we have this mesh model, it can be lit from different angles, or it can use a feature called "ambient occlusion" to colour cavities darker, based on the increased difficulty of light getting into those regions.
Finally, knowing the positions of the cameras with respect to the mesh, the software can emulate a set of slide-projectors located at those positions, projecting the original photographs back onto the mesh. This turns individual facets into snippets of photograph, averaged and overlaid from multiple images.
This photographic mesh is then unwrapped from the surface and distorted into a flat image (like a tigerskin rug). The image file is included with the mesh model, and can be re-wrapped into the model for added realism. For extra resolution and higher-detail skins (without making the skin file too large and unwieldy), the unwrapped mesh can be saved with higher resolution by having it automatically broken up and saved as a set of two, four, eight or sixteen (or more) images.
We then have a coloured model made of triangles that can be simply manipulated, angled and lit by standard 3d graphics routines, and which can also show "photographic" detail that's smaller than the size of the mesh.
Depending on the sophistication (and cost!) of the software, it may include additional "added" value features, such as the ability to use the camera's GPS and tilt data to correctly set the model's horizon, scaling and geographical orientation (extremely useful when creating models of archaeological sites from drone images). Alternatively, scale markers can be used to set scale, with barcode-style markings that the software can read and identify. "Disc" markers with QR-Code-style ID marks base don a concentric circle grid can allow the software to locate a point at the centre of the marker, and by extrapolation from the surrounding rings, use the centre of the marker as an agreed point between photographs, located to better accuracy than a single pixel.
However these "professional" features tend not to appear on normal editions of photogrammetry software, and tend to be reserved for professional editions costing over 10k. Given that an archaeological projects can costs hundreds of thousands (or millions!) of pounds, and employ highly-paid, highly trained professionals, it's perhaps not unreasonable that the critical piece of software at the heart of one of those large projects (or used by a consultancy company for multiple clients) might cost fifteen grand.