Apparent Distortions in Photography and the Geometry of Visual Space
by
Robert French EMAIL robert.e.french@gmail.com
Abstract:
In this paper I contrast the geometric structure of phenomenal visual space with that of photographic images. I argue that topologically both are two-dimensional and that both involve central projections of scenes being depicted. However, I also argue that the metric structures of the spaces differ inasmuch as two types of “apparent distortions” – marginal distortion in wide-angle photography and close-up distortions- which occur in photography do not occur in the corresponding visual experiences. In particular, I argue that the absence of marginal distortions in vision is evidence for a holistic metric of visual space that is spherical, and that the absence of close-up distortions shows that the local metric structure possesses a variable curvature which is dependent upon the distance away of objects being viewed.
In this paper I will compare and contrast the geometric structure of visual space, which I am using to refer to the space of phenomenal visual experience, and the spatial structure of photographic images. I will argue that topologically both are two-dimensional and that both involve central projections of scenes being depicted. However, I will also argue that the metric structures of the spaces differ inasmuch as two types of “apparent distortions” occurring in photography do not occur in the corresponding visual experience. In particular, in this regard I will take note of both so-called “marginal distortions” in wide-angle photography, and alo so-called “close-up” or “perspective” distortions. I will argue that the absence of these “apparent distortions” in visual space shows that the space possesses a non-Euclidean (in the sense that not all of Euclid’s postulates apply to the space) metric structure.
Before proceeding further it may be helpful to make a few comments on the way in which I will be using the term “apparent distortion.” In the photographic literature1 a distinction is often made between so-called “real” and so-called “apparent” distortions. “Real distortions” are defined as deviations from a rectilinear perspective due to defective lenses, as for example occur with the “curvilinear “ distortions of wide-angle photography whereby straight lines are projected as being curved near the margins of the resulting photographs; diverging apart in pincushion distortion and converging together in barrel distortion. In contrast, “apparent distortions,” are defined as deviations between the geometric character of visual experiences and the geometric character of corresponding photographs even when no defects in photographic lenses are involved; i. e., where straight lines are projected as being straight in the photographs. I will now turn to some comments comparing the topology of visual space and photographic images; in particular on the issue of the dimensionality of the spaces.
While, due to the existence of visual depth perception, it is often held that visual experience, unlike photographic images, are three-dimensional. I believe that this is clearly not the cast. The standard recursive defintion2 of dimensionality is not directly applicable to either case, being an “in the small” criterion in terms of the dimensionality of bounding spaces for all infinitely-small neighborhoods of points of the space in question; the dimensionality of the original space being one greater than that of the bounding spaces. Inasmuch as these neighborhoods will be too small to be visually distinguishable (the limit of phenomenal visual spatial acuity being approximately one minute of arc), it is not possible to strictly apply the criterion to visual space. However, it is possible to apply an analogous “in the large” criterion which was given earlier by Poincaré3 whereby the dimensionality of a space will be one greater than that of the dimensionality of the bounding space for any given region. Since regions of both photographs and also visual experiences can be bounded by a one-dimensional space (a line constituted by the boundary between a projected object and its background)l, it follows that the spaces themselves must be two dimensional.
While counterexamples can be given to Poincaré’s analysis, such as in the cases of a point giving boundaries to two cones at their intersection, or a circle giving boundaries to a solid sphere embedded in a solid torus, it would seem that neither vision nor photography have much in common with these types of cases. In fact it can be noted that in the one case where prima facie it might be thought that visual experience is three-dimensional, when features of objects depicted possess different physical depths from the eye, this very fact can be used to constitute an edge, which constitutes a bounding line around the features in question. Further considerations pointing to the two-dimensionality of both visual space and photographic images include the points that, except in cases of semi-transparency, we neither see the interiors of objects nor intervening points between our eyes and objects being seen, and that these do not appear in photographs either.
A second feature held in common between visual space and photographic images involves the fact that they both correspond to the results of forming a central projection of what is depicted onto a two-dimensional surface. Of course in the visual space case I do not mean that an optical image is literally formed by means of focusing light rays on the space, but rather that the geometric structure of images depicted in the spaced is isomorphic with that which would be formed by such a projection. Thus, in both the case of photography and that of visual space the image is such as would be formed by having a pencil of straight lines (e. g., light rays) pass from each exterior point on the front surfaces of the objects being depicted through a fixed point (the center of projection), and then cutting this pencil of straight lines by a two-dimensional surface at some location either before or after the fixed point. If the cut occurs before the fixed point the projection is known as a direct linear projection, and if it occurs afterwards it is known as an indirect linear projection. It can be noted that in indirect linear projections (which occur on the negative in photography and also on the retina of the eye), the image is in reverse.
Due to the fact that both visual space and photographic images constitute central projections of what is depicted, it follows that both topologic and metric invariants of projection will be held in common between vision and photography. In particular, note can be taken of such topologic invariants as (aside from complications involving occlusion) the total number of objects being depicted and the number of sides possessed by individual objects being depicted. Even while rectangles, when viewed askew, will be projected as trapezoids and circles, when viewed askew, will be projected as ellipses, it can be noted that there are still metric invariants here, notably the “cross ratio.”4
Rather than develop in more detail the just-mentioned issue of the nature of invariants of projection in the geometric structure of what is depicted, I wish to instead address issues concerning the shape of the two-dimensional surface which in vision cuts the pencil of rays of a projection. Is it a flat plane like the film of most cameras? Is it a cylinder like the film of a panoramic camera or the screens where wide-angle movies are shown? Or is it a sphere like a planetarium or the retina of the eye? Or is it a still more complex surface? By taking note of two “distortions” which occur in photography but not in vision (and where thus the geometric structure is not invariant between the two), I believe that it can be demonstrated that it is a more complex surface than any of these. I will first take note of a distortion, “marginal distortion,” which takes place in peripheral regions of wide-angle photographs in order to show that holistically the metric structure of visual space is spherical, and will then take note of a second type of distortion, sometimes referred to as “close-up” or “perspective” distortion, in order to show that when objects depicted possess different physical depths from the eye, visual space possesses a variable curvature.
It can be noted that in central projections onto planes, as in most photographs, straight lines are projected as straight lines. However, areas in peripheral regions of a wide-angle photograph are not, as in the corresponding visual experiences, proportional to the visual angles subtended, but instead contain “marginal distortions” whereby areas in peripheral regions of the photograph subtend significantly smaller visual angles than do equal areas near the centers of the photographs. Since the field of view of vision is approximately 180o by 120o, which is greater than the field of view of even ultra-wide-angle lenses, I take it that the fact that marginal distortions do not occur in vision is evidence that the metric structure of visual space is fundamentally different from that of a flat photograph. Before proceeding to make some positive remarks on what the metric structure of visual space is, I will enter into a short digression into the geometry of marginal distortion.
To simplify matters concerning the optics of wide-angle lenses here, I will consider the geometric optics of a pinhole camera. It can be noted that the total length of a line lying along the film in such a camera is proportional to the tangent of the angle Θ formed between a line perpendicular to the film passing through the center of projection (the pinhole) and the line lying along the film. Thus, the marginal distortion present at a given angle Θ here will be proportional to the derivative of the tangent of this angle, or the secant squared. Corresponding areas subtended on that region of the film will hence be proportional to the secant of Θ raised to the fourth power. In wide-angle photography the result is analogous to what occurs in polar regions of the Mercator projection of the globe (where a sphere is projected onto a cylinder, which is then flattened out) whereby the polar regions are disproportionately enlarged (e. g., with Greenland being projected as being larger than Australia while in fact possessing less than one third of the area). By similar reasoning, it can also be noted that a sphere in the periphery of a wide-angle photograph will be projected as an ellipse, although this effect does not occur in vision, as L. P. Clerc note as follows:
“From whatever angle we may look at a sphere its outline always appears exactly circular. On the contrary, the plane perspective of a sphere is an ellipse, except in the case where the center of the sphere is on the perpendicular from the viewpoint to the projection plane. As the visual ray to the center of the sphere makes an increasing angle with this perpendicular, so the distortion also becomes greater.5
It is true that in extreme wide-angle photography many of the foregoing effects can be avoided with a fish-eye lens, (where the effects are analogous to those present in a conic projection of a globe whereby a projection onto a cone is flattened out so as to form a circular image) but then, due to very-pronounced barrel distortion, straight lines are no longer projected as straight lines. In fact it is noteworthy that in vision also, unlike photography utilizing a rectilinear perspective, straight lines are projected as great circles which converge at both poles. For example, if a long rail fence is looked at head-on then the top and bottom rails will appear to converge towards each other in both directions although this effect will not be present in a wide-angle photograph; it being compensated by the marginal distortion which it was just noted also occurs in these photographs. While it is true that such a convergence also occurs with photographs taken with a fish-eye lens due to the just-noted barrel distortion in the resulting photographs, unlike vision, these photographs will also possess a latitudinal “stretching” in peripheral regions in the sense that the projections of objects along azimuthal (circumferential) axes will be disproportionately large compared with the projections along the polar axes.
It can be noted now that in cases of projections onto spherical surfaces equal solid angles of the projection will subtend equal areas of the sphere. Hence it will both be the case that no marginal distortions will occur in these projections and also that there will be no disproportionate “stretchings” in projected images such as those occurring with the projection of a fish-eye lens onto a flat surface. I take it then that the fact that neither of these “apparent distortions” is present in vision is strong evidence that at least the holistic metric structure of visual experience corresponds to the metric structure of the surface of a sphere.
I wish to turn now to a second type of “apparent distortion, “close-up” or “perspective” distortion, whereby close objects appear to be disproportionately large in a photograph. A good example here is that of a close-up photograph of a human face where the nose appears to be disproportionately large. Even taking into account the comparison between the position of the camera in taking the picture and the distance from which the corresponding photograph is viewed, (the proper viewing distance is determined by the focal length of the camera lens, the distance from the lens to the film when the lens is focused on infinity, multiplied b the extent to which the photographic print is enlarged), this effect is still quite pronounced, as Nelson Goodman notes as follows:
“And even we who are most inured to perspective rendering do not always accept it as faithful representation: the photograph of a man with his feet thrust forward looks distorted, and Pike’s Peak dwindles dismally in a snapshot. As the saying goes, there is nothing like a camera to make a molehill out of a mountain.6
Unlike Goodman, I do not believe that the foregoing effect demonstrates any claims that seeing in central perspective is a matter of habit and thus a learned phenomenon that may not have been universally acquired. Instead, I believe that what is actually going on here is related to various “constancies” noted by Gestalt psychologists; in particular, the tendencies towards size and shape constancy. I wish to take particular note here of what Robert Thouless7 has termed the “phenomenal regression to the real object,” whereby he claims that perceived size and shape possess intermediate values between those resulting from a projection onto a surface of constant curvature and the real physical sizes and shapes of the objects. Care must be taken in interpreting results of experiments here, and particular note should be taken of the nature of the instructions given to the participants in the experiments. For example, if the instructions are to make estimates of the actual sizes and shapes of the objects much more constancy is reported than in the case of “projective” instructions where the participants are asked to take the attitude of an artist and report the geometric nature of appearances. It is noteworthy that even with projective instructions a significant degree of size and shape constancy is reported.8 It is also noteworthy that the reported constancies are significantly greater under binocular vision than under monocular vision.9
I will now show how the foregoing results can be accounted for in terms of the internal metric structure of a space possessing a variable curvature and will also show how, at least in principle, phenomenal visual depth perception may also be accountable for by this structure. The key point here is that since the area on a sphere subtending a given spherical angle is a function of the square of the radius of the sphere, the tendency towards sized constancy can be accounted for in terms of the effects of distorting a sphere by means of a “depth function” defined as a function of the physical distance away of perceived objects. Thus, the depth function p can be defined here as
ρ = f(Θ, Ф, t) [1]
Where ρ is the physical distance from the “cyclopean eye” (an imaginary eye centered between the two eyes) to the physical object being seen in the direction Θ, Ф at time t. While in normal circumstance a unique object will be determined by these conditions and thus the depth function will be a genuine function there are special cases involving semi-transparency where this is not the case. I will not attempt to deal with these special cases here though.
The question thus arises as to how to use the just-defined depth function so as to define a transformation on the metric structure of a sphere, which will account for the size constancy tendency. Inasmuch as the tendency towards size constancy seems to be independent of the direction being looked at Θ, Ф, although as was noted being dependent on the physical depth, p, of the object being looked at, it follows that a visual space will be spherical when objects constituted in it are equidistant from the cyclopean eye, in particular when they are infinitely distant, a case approximated by looking at the sky. If two physical objects at different depths subtend equal solid angles with respect to the eye, the closer object will be constituted smaller in visual space than the more distant one, due to the tendency towards size constancy. Thus, inasmuch as the area on the surface of a sphere projected by a given solid angle is proportional to the square of the radius of the sphere, it follows that a visual space constituted from objects at greater physical depths will possess a greater radius than one constituted from objects at less depth. It also follows that a visual space constituted from objects infinitely distant will possess the greatest possible radius.
The foregoing considerations suggest a possible “external” description of visual space in which the two-dimensional visual space is embedded in a three-dimensional space, and its shape is determined by means of a depth function operating in that three-dimensional space. A mapping can then be defined between the physically-defined spherical polar coordinates, Θ, Ф, and ρ, and a set of spherical polar coordinates Θ’, Ф’, and ρ’, used for the external description of visual space, with the correlations being defined as follows:
Θ’ = Θ [2]
Ф’ = Ф [3]
ρ’ = C – 1/aρ [4]
C is the radius of a visual space in which all of the objects constituted in it are infinitely distant, and a is a proportionality constant. My hypothesis now is that these transformations constitute an external description of the geometry of visual space, where ρ’ gives the distance to a point in visual space in the direction Θ’, Ф’, from a point in a third dimension not in visual space (the center of the sphere in the special case where ρ’ is the same in all directions). It can be noted that ρ’ approaches 0 when aρ approaches 1/C, but inasmuch as there exists a minimum threshold on the depth at which physical objects can be brought into sharp focus, about 6 inches from the eye, this result is consistent with the visual perception process. Also, it is possible that the transformation on the depth function would need to be more complex in order to fully account for the size constancy tendency, but due to unclarity in precisely what the mathematical character of this tendency is, it would seem to be wise to keep the transformation as possible here. It should also be emphasized that p’ has no immediate phenomenal significance, inasmuch as it itself is not contained in visual space, and thus it does not, for example, correspond to perceived visual depth.10
I will close the paper by briefly mentioning some ways in which the preceding analysis may be able to explain other phenomenal aspects of visual perception. For one point, such an analysis predicts the existence of discontinuities in visual space when there are sharp physical discontinuities in the depth function for relatively close objects, and these discontinuities would seem to be capable of explaining the phenomenon of seeing an “edge” when one relatively close physical object partially occludes the view of another. Another phenomenal characteristic of vision which the analysis seems to be at least in principle capable of explaining is the tendency towards shape constancy; that is, the tendency for the visual shapes of objects seen askew to vary not in accordance with the laws of geometrical perspective, but instead to remain closer to the shapes projected when these objects are seen head-on. The main point ot be made here is that if an object is seen askew, then one edge of it must be closer to the viewer than the other, and thus both ρ and ρ’ will possess different values for these two edges, and the object will be constituted in visual space at a slant also. However, due to the inverse nature of Equation 4 relating ρ’ to ρ, the area of the object as constituted on this slant will not be sufficiently great so as for the object to retain the same shape as when seen head-on, and so it will instead be perceived at a compromise shape in between that shape and the one given by the laws of perspective.
I also believe that in principle my analysis can account for the aspect of phenomenal depth perception which is enhanced by binocular vision. My claim here is that one can apprehend to some extent the internal metric structure of visual space, as for example by noticing the presence of a “corner,” of convexity or concavity, or the present of a discontinuity, in the space. Certainly visual space does not appear to be “flat” when the objects constituted in it possess different physical depths, and I wish to claim that one can use this “lack of flatness,” that is, the apprehension of an internal curvature , as a phenomenal cue for dept. It seems clear that these sorts of phenomenal depth cues are present to some extent even in monocular vision, since one can reverse a “Necker cube,” or see movies in depth using only one eye. As I previously noted, there is also a great deal of evidence that both the tendency towards size constancy and the tendency towards shape constancy are greatly enhanced in binocular vision. Thus, it would seem that the binocular depth cue of retinal disparity enhances monocular phenomenal depth effects by means of increasing the value of the proportionality constant a in the equation relating ρ’ to ρ, and I wish to equate the apprehension of the resulting changes in the internal metric structure of the space with the phenomenal apprehension of visual depth. This may also account for the striking so-called “3D” effects of random stereograms and so-called “3D” movies, although of course I am maintaining that the experiences, even here, are literally still two-dimensional; they just are not flat.
FOOTNOTES
1. See for example Barbara Upton and John Upton, Photography, Third Edition (Boston: Little Brown, 1985), p. 68
2. See Karl Menger, “What is Dimension?” in American Mathematical Monthly 50 (1943), pp. 2-7.
3. Henri Poincaré, Dernieres Pensees, translated b John Bolduc (New York: Dover Publications, 1963).
4. The “cross ratio” of four points is defined as follow:
5. L. P. Clerc, Photography Theory and Practice, edited by D. A. Spencer (New York: Focal Press Ltd., 1970), p. 31
6. Nelson Goodman, Languages of Art (Indianapolis: Hackett Publishing Co., 1976), p. 15.
7. Robert Thouless, “Phenomenal Regression to the Real Object, I,” in British Journal of Psychology 21 (1931), pp. 339-359.
8. A. S. Gilinsky, “The Effect of Attitude upon the Perception of Size,” in American Journal of Psychology 68 (1955), pp. 173-192.
9. On the tendency towards size constancy being enhanced in binocular vision, see Edwin Boring and D. W. Taylor, “Apparent Visual Size as a Function of Distance for Monocular Observers,” in American Journal of Psychology 55 (1942), pp. 102-105; and E. L. Chalmers, Jr., “Monocular Cues in the Perception of Size and Distance,” in American Journal of Psychology 65 (1952), pp. 415-423. On the tendency towards shape constancy being enhanced in binocular vision, see Thouless, “Phenomenal Regression to the Real Object, I I,” in British Journal of Psychology 22 (1931), pp. 1-30.
10. For a description of the resulting internal geometry of visual space see Robert French, “The Geometry of Visual Space,” in Nous 21 (1987), pp. 115-133.