I'm very confused about format of "Coordinate Transformation Matrix" (especially about discrepancies between its mathematical definition and its "X server definition"), and due to lack of informative sources I need to ask a few questions here with hope to clear things up.
- initially I thought "Coordinate Transformation Matrix" was just a simple mapping between points and its definition was exactly the same as the one used in mathematics/provided by wikipedia etc. I assumed its application was as simple as "take the matrix and input point, multiply them and get the output point as a result". Section "Using the Coordinate Transformation Matrix" on this website:
seems to support my initial theory with an example showing how a point (400,197) multiplied by an identity matrix results in the same point, but aside from this trivial case things can't be that simple. What if a multiplication results in a point with negative coordinate(s)? While it's mathematically accurate and works fine in a standard Cartesian coordinate system, points on a computer screen never have negative coordinates, so how does X server handle/translate such negative values? A simple example: it's easy to see that 90 degrees rotation matrix taken from section "Using the Coordinate Transformation Matrix" of the aforementioned website (BTW: why do they call left rotation a "clockwise 90 deg" and right rotation a "counterclockwise 90 deg", it should be the other way around, right?) maps point (x,y) to point (-y+1,x) which seems like a complete nonsense as the first coordinate of the resulting point is negative for every y greater than 1. Obviously, there must be some kind of internal translation/normalization but I found absolutely no sources explaining how the whole process works "under the hood" and what end result it produces.
- what's the relevance of "translation components" for a rotation matrix? For example, a left rotation would be mathematically represented by a
"0 -1 0 1 0 0 0 0 1"matrix (row-major order) while for X server/xinput etc. left rotation is represented by a"0 -1 1 1 0 0 0 0 1"matrix which would mathematically correspond to a left rotation followed by a translation by[1,0]vector. The only reference on the subject I could find was the following quote on libinput reference website:
"The translation component is expected to be normalized to the device coordinate range"
which is vague to say the least and really explains nothing to me. Could you please provide me with a few examples showing how the transformation process really works for rotations and how "math definition" and "X server definition" of rotation differ?
- some websites describe a general form of "Coordinate Transformation Matrix" as a matrix with the following elements:
"a 0 c 0 e f 0 0 1"
where:
a = touch_area_width / total_width
c = touch_area_x_offset / total_width
e = touch_area_height / total_height
f = touch_area_y_offset / total_heightIs it an "official" definition? This, again, seems to significantly differ from mathematical definition of transformation matrix and leads to some surprising results. Example: we have N indentical monitors stacked vertically, with a touchscreen attached to the bottom one. Now, we want coordinates of the touchscreen to map correctly to the monitor it is attached to. Intuitively, point (x,y) on the touchscreen should be mapped to point (x,(N-1)*MONITOR_HEIGHT+y) of a virtual screen consisting of all monitors (x coordinate remains unchanged while y coordinate must be increased by height times all monitors above the last one). However, when we follow the aforementioned definition of a, c, e and f coefficients, the result is completely different. In our case, for N identical monitors aligned vertically:
a = touch_area_width / total_width = 1
c = touch_area_x_offset / total_width = 0
e = touch_area_height / total_height = 1/N
f = touch_area_y_offset / total_height = (N-1)/NSo the resulting matrix is "1 0 0 0 1/N (N-1)/N 0 0 1" and it maps point (x,y) to (x,(y/N + (N-1)/N)) which is obviously a different result. For N=2 the former transformation maps (x,y) to (x,MONITOR_HEIGHT+y) while the latter maps (x,y) to (x,(0.5*y + 0.5)) which seems absolutely counterintuitive as second coordinate of resulting point does not depend at all on the number of monitors. What's wrong with my reasoning?