Why, historically, do we multiply matrices as we do?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
70
down vote

favorite
37












Multiplication of matrices — taking the dot product of the $i$th row of the first matrix and the $j$th column of the second to yield the $ij$th entry of the product — is not a very intuitive operation: if you were to ask someone how to mutliply two matrices, he probably would not think of that method. Of course, it turns out to be very useful: matrix multiplication is precisely the operation that represents composition of transformations. But it's not intuitive. So my question is where it came from. Who thought of multiplying matrices in that way, and why? (Was it perhaps multiplication of a matrix and a vector first? If so, who thought of multiplying them in that way, and why?) My question is intact no matter whether matrix multiplication was done this way only after it was used as representation of composition of transformations, or whether, on the contrary, matrix multiplication came first. (Again, I'm not asking about the utility of multiplying matrices as we do: this is clear to me. I'm asking a question about history.)










share|cite|improve this question























  • As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
    – gnometorule
    Jan 7 '13 at 4:19










  • Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
    – user17762
    Jan 7 '13 at 4:21











  • There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
    – Mark Bennet
    Jan 7 '13 at 8:47






  • 1




    Possible duplicate of math.stackexchange.com/questions/192835/….
    – lhf
    Jun 2 '14 at 12:45










  • Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
    – m_t_
    Nov 30 '16 at 15:25














up vote
70
down vote

favorite
37












Multiplication of matrices — taking the dot product of the $i$th row of the first matrix and the $j$th column of the second to yield the $ij$th entry of the product — is not a very intuitive operation: if you were to ask someone how to mutliply two matrices, he probably would not think of that method. Of course, it turns out to be very useful: matrix multiplication is precisely the operation that represents composition of transformations. But it's not intuitive. So my question is where it came from. Who thought of multiplying matrices in that way, and why? (Was it perhaps multiplication of a matrix and a vector first? If so, who thought of multiplying them in that way, and why?) My question is intact no matter whether matrix multiplication was done this way only after it was used as representation of composition of transformations, or whether, on the contrary, matrix multiplication came first. (Again, I'm not asking about the utility of multiplying matrices as we do: this is clear to me. I'm asking a question about history.)










share|cite|improve this question























  • As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
    – gnometorule
    Jan 7 '13 at 4:19










  • Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
    – user17762
    Jan 7 '13 at 4:21











  • There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
    – Mark Bennet
    Jan 7 '13 at 8:47






  • 1




    Possible duplicate of math.stackexchange.com/questions/192835/….
    – lhf
    Jun 2 '14 at 12:45










  • Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
    – m_t_
    Nov 30 '16 at 15:25












up vote
70
down vote

favorite
37









up vote
70
down vote

favorite
37






37





Multiplication of matrices — taking the dot product of the $i$th row of the first matrix and the $j$th column of the second to yield the $ij$th entry of the product — is not a very intuitive operation: if you were to ask someone how to mutliply two matrices, he probably would not think of that method. Of course, it turns out to be very useful: matrix multiplication is precisely the operation that represents composition of transformations. But it's not intuitive. So my question is where it came from. Who thought of multiplying matrices in that way, and why? (Was it perhaps multiplication of a matrix and a vector first? If so, who thought of multiplying them in that way, and why?) My question is intact no matter whether matrix multiplication was done this way only after it was used as representation of composition of transformations, or whether, on the contrary, matrix multiplication came first. (Again, I'm not asking about the utility of multiplying matrices as we do: this is clear to me. I'm asking a question about history.)










share|cite|improve this question















Multiplication of matrices — taking the dot product of the $i$th row of the first matrix and the $j$th column of the second to yield the $ij$th entry of the product — is not a very intuitive operation: if you were to ask someone how to mutliply two matrices, he probably would not think of that method. Of course, it turns out to be very useful: matrix multiplication is precisely the operation that represents composition of transformations. But it's not intuitive. So my question is where it came from. Who thought of multiplying matrices in that way, and why? (Was it perhaps multiplication of a matrix and a vector first? If so, who thought of multiplying them in that way, and why?) My question is intact no matter whether matrix multiplication was done this way only after it was used as representation of composition of transformations, or whether, on the contrary, matrix multiplication came first. (Again, I'm not asking about the utility of multiplying matrices as we do: this is clear to me. I'm asking a question about history.)







linear-algebra matrices math-history






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 7 '13 at 4:47

























asked Jan 7 '13 at 4:12









msh210

1,96311431




1,96311431











  • As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
    – gnometorule
    Jan 7 '13 at 4:19










  • Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
    – user17762
    Jan 7 '13 at 4:21











  • There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
    – Mark Bennet
    Jan 7 '13 at 8:47






  • 1




    Possible duplicate of math.stackexchange.com/questions/192835/….
    – lhf
    Jun 2 '14 at 12:45










  • Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
    – m_t_
    Nov 30 '16 at 15:25
















  • As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
    – gnometorule
    Jan 7 '13 at 4:19










  • Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
    – user17762
    Jan 7 '13 at 4:21











  • There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
    – Mark Bennet
    Jan 7 '13 at 8:47






  • 1




    Possible duplicate of math.stackexchange.com/questions/192835/….
    – lhf
    Jun 2 '14 at 12:45










  • Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
    – m_t_
    Nov 30 '16 at 15:25















As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
– gnometorule
Jan 7 '13 at 4:19




As to the last, matrix multiplication definitely came first (centuries first), and I'm reasonably certain from a compact representation of systems of linear equations. Leibniz already had a determinant formula. As I have no historic sources for first use, this doesn't answer your question though.
– gnometorule
Jan 7 '13 at 4:19












Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
– user17762
Jan 7 '13 at 4:21





Matrices are linear operators and have meaning only when it acts on vectors. Given matrices $A$ and $B$, what would we want the operator/matrix $BA$ to mean? Ideally, we would want $BA$ to mean the following. For all vectors $x$, we want $(BA)x = B(Ax)$ Once we have this i.e. $(BA)x = B(Ax)$ for all $x$, then we are forced to live with the way we currently multiply matrices. And as to why matrix-vector product is defined in the way it is, the primary reason for introducing matrices was to handle linear transformation in a notationally convenient way.
– user17762
Jan 7 '13 at 4:21













There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
– Mark Bennet
Jan 7 '13 at 8:47




There is another question, of course, which is not so much why matrix multiplication was defined like this, but why it stuck - why this apparently curious definition took off, and didn't die the death of so many putative definitions. And that was because it proved mathematically fruitful.
– Mark Bennet
Jan 7 '13 at 8:47




1




1




Possible duplicate of math.stackexchange.com/questions/192835/….
– lhf
Jun 2 '14 at 12:45




Possible duplicate of math.stackexchange.com/questions/192835/….
– lhf
Jun 2 '14 at 12:45












Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
– m_t_
Nov 30 '16 at 15:25




Why is it "not intuitive"? If you ask someone how to multiply two matrices and they think about what that multiplication is supposed to mean, they absolutely will come up with the usual definition.
– m_t_
Nov 30 '16 at 15:25










3 Answers
3






active

oldest

votes

















up vote
63
down vote



accepted










Matrix multiplication is a symbolic way of substituting one linear change of variables into another one. If $x' = ax + by$ and $y' = cx+dy$, and
$x'' = a'x' + b'y'$ and $y'' = c'x' + d'y'$ then we can plug the first pair of formulas into the second to express $x''$ and $y''$ in terms of $x$ and $y$:
$$
x'' = a'x' + b'y' = a'(ax + by) + b'(cx+dy) = (a'a + b'c)x + (a'b + b'd)y
$$
and
$$
y'' = c'x' + d'y' = c'(ax+by) + d'(cx+dy) = (c'a+d'c)x + (c'b+d'd)y.
$$
It can be tedious to keep writing the variables, so we use arrays to track the coefficients, with the formulas for $x'$ and $x''$ on the first row and for $y'$ and $y''$ on the second row. The above two linear substitutions coincide with the matrix product
$$
left(
beginarraycc
a'&b'\c'&d'
endarray
right)
left(
beginarraycc
a&b\c&d
endarray
right)
=
left(
beginarraycc
a'a+b'c&a'b+b'd\c'a+d'c&c'b+d'd
endarray
right).
$$
So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another (order matters). The formulas are not intuitive, but it's nothing other than the simple idea of combining two linear changes of variables in succession.



Matrix multiplication was first defined explicitly in print by Cayley in 1858, in order to reflect the effect of composition of linear transformations. See paragraph 3 at http://darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html. However, the idea of tracking what happens to coefficients when one linear change of variables is substituted into another (which we view as matrix multiplication) goes back further. For instance, the work of number theorists in the early 19th century on binary quadratic forms $ax^2 + bxy + cy^2$ was full of linear changes of variables plugged into each other (especially linear changes of variable that we would recognize as coming from $rm SL_2(mathbf Z)$). For more on the background, see the paper by Thomas Hawkins on matrix theory in the 1974 ICM. Google "ICM 1974 Thomas Hawkins" and you'll find his paper among the top 3 hits.






share|cite|improve this answer





























    up vote
    26
    down vote













    Here is an answer directly reflecting the historical perspective from the paper Memoir on the theory of matrices By Authur Cayley, 1857. This paper is available here.



    This paper is credited with "containing the first abstract definition of a matrix" and "a matrix algebra defining addition, multiplication, scalar multiplication and inverses" (source).



    In this paper a nonstandard notation is used. I will do my best to place it in a more "modern" (but still nonstandard) notation. The bulk of the contents of this post will come from pages 20-21.



    To introduce notation, $$ (X,Y,Z)= left( beginarrayccc
    a & b & c \
    a' & b' & c' \
    a'' & b'' & c'' endarray right)(x,y,z)$$



    will represent the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ which are then called $(X,Y,Z)$.



    Cayley defines addition and scalar multiplication and then moves to matrix multiplication or "composition". He specifically wants to deal with the issue of:



    $$(X,Y,Z)= left( beginarrayccc
    a & b & c \
    a' & b' & c' \
    a'' & b'' & c'' endarray right)(x,y,z) quad textwhere quad (x,y,z)= left( beginarrayccc
    alpha & beta & gamma \
    alpha' & beta' & gamma' \
    alpha'' & beta'' & gamma'' \ endarray right)(xi,eta,zeta)$$



    He now wants to represent $(X,Y,Z)$ in terms of $(xi,eta,zeta)$. He does this by creating another matrix that satisfies the equation:



    $$(X,Y,Z)= left( beginarrayccc
    A & B & C \
    A' & B' & C' \
    A'' & B'' & C'' \ endarray right)(xi,eta,zeta)$$



    He continues to write that the value we obtain is:



    $$beginalignleft( beginarrayccc
    A & B & C \
    A' & B' & C' \
    A'' & B'' & C'' \ endarray right) &= left( beginarrayccc
    a & b & c \
    a' & b' & c' \
    a'' & b'' & c'' endarray right)left( beginarrayccc
    alpha & beta & gamma \
    alpha' & beta' & gamma' \
    alpha'' & beta'' & gamma'' \ endarray right)\[.25cm] &= left( beginarrayccc
    aalpha+balpha' + calpha'' & abeta+bbeta' + cbeta'' & agamma+bgamma' + cgamma'' \
    a'alpha+b'alpha' + c'alpha'' & a'beta+b'beta' + c'beta'' & a'gamma+b'gamma' + c'gamma'' \
    a''alpha+b''alpha' + c''alpha'' & a''beta+b''beta' + c''beta'' & a''gamma+b''gamma' + c''gamma''endarray right)endalign$$



    This is the standard definition of matrix multiplication. I must believe that matrix multiplication was defined to deal with this specific problem. The paper continues to mention several properties of matrix multiplication such as non-commutativity, composition with unity and zero and exponentiation.



    Here is the written rule of composition:




    Any line of the compound matrix is obtained by combining the corresponding line of the first component matrix successively with the several columns of the second matrix (p. 21)







    share|cite|improve this answer
















    • 1




      Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
      – Vilhelm Gray
      May 11 '17 at 18:11










    • Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
      – Trunk
      Dec 28 '17 at 15:57


















    up vote
    13
    down vote













    beginalign
    u & = 3x + 7y \ v & = -2x + 11y \ \ \ \
    p & =13u-20v \ q & = 2u+6v
    endalign
    Given $x$ and $y$, how do you find $p$ and $q$? How do you write:
    beginalign
    p & = bullet, x + bullet, y \ q & = bullet, x+bullet, yquadtext?
    endalign
    What numbers go where the four $bullet$'s are?



    That is what matrix multiplication is. The rationale is mathematical, not historical.






    share|cite|improve this answer






















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "69"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f271927%2fwhy-historically-do-we-multiply-matrices-as-we-do%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      63
      down vote



      accepted










      Matrix multiplication is a symbolic way of substituting one linear change of variables into another one. If $x' = ax + by$ and $y' = cx+dy$, and
      $x'' = a'x' + b'y'$ and $y'' = c'x' + d'y'$ then we can plug the first pair of formulas into the second to express $x''$ and $y''$ in terms of $x$ and $y$:
      $$
      x'' = a'x' + b'y' = a'(ax + by) + b'(cx+dy) = (a'a + b'c)x + (a'b + b'd)y
      $$
      and
      $$
      y'' = c'x' + d'y' = c'(ax+by) + d'(cx+dy) = (c'a+d'c)x + (c'b+d'd)y.
      $$
      It can be tedious to keep writing the variables, so we use arrays to track the coefficients, with the formulas for $x'$ and $x''$ on the first row and for $y'$ and $y''$ on the second row. The above two linear substitutions coincide with the matrix product
      $$
      left(
      beginarraycc
      a'&b'\c'&d'
      endarray
      right)
      left(
      beginarraycc
      a&b\c&d
      endarray
      right)
      =
      left(
      beginarraycc
      a'a+b'c&a'b+b'd\c'a+d'c&c'b+d'd
      endarray
      right).
      $$
      So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another (order matters). The formulas are not intuitive, but it's nothing other than the simple idea of combining two linear changes of variables in succession.



      Matrix multiplication was first defined explicitly in print by Cayley in 1858, in order to reflect the effect of composition of linear transformations. See paragraph 3 at http://darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html. However, the idea of tracking what happens to coefficients when one linear change of variables is substituted into another (which we view as matrix multiplication) goes back further. For instance, the work of number theorists in the early 19th century on binary quadratic forms $ax^2 + bxy + cy^2$ was full of linear changes of variables plugged into each other (especially linear changes of variable that we would recognize as coming from $rm SL_2(mathbf Z)$). For more on the background, see the paper by Thomas Hawkins on matrix theory in the 1974 ICM. Google "ICM 1974 Thomas Hawkins" and you'll find his paper among the top 3 hits.






      share|cite|improve this answer


























        up vote
        63
        down vote



        accepted










        Matrix multiplication is a symbolic way of substituting one linear change of variables into another one. If $x' = ax + by$ and $y' = cx+dy$, and
        $x'' = a'x' + b'y'$ and $y'' = c'x' + d'y'$ then we can plug the first pair of formulas into the second to express $x''$ and $y''$ in terms of $x$ and $y$:
        $$
        x'' = a'x' + b'y' = a'(ax + by) + b'(cx+dy) = (a'a + b'c)x + (a'b + b'd)y
        $$
        and
        $$
        y'' = c'x' + d'y' = c'(ax+by) + d'(cx+dy) = (c'a+d'c)x + (c'b+d'd)y.
        $$
        It can be tedious to keep writing the variables, so we use arrays to track the coefficients, with the formulas for $x'$ and $x''$ on the first row and for $y'$ and $y''$ on the second row. The above two linear substitutions coincide with the matrix product
        $$
        left(
        beginarraycc
        a'&b'\c'&d'
        endarray
        right)
        left(
        beginarraycc
        a&b\c&d
        endarray
        right)
        =
        left(
        beginarraycc
        a'a+b'c&a'b+b'd\c'a+d'c&c'b+d'd
        endarray
        right).
        $$
        So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another (order matters). The formulas are not intuitive, but it's nothing other than the simple idea of combining two linear changes of variables in succession.



        Matrix multiplication was first defined explicitly in print by Cayley in 1858, in order to reflect the effect of composition of linear transformations. See paragraph 3 at http://darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html. However, the idea of tracking what happens to coefficients when one linear change of variables is substituted into another (which we view as matrix multiplication) goes back further. For instance, the work of number theorists in the early 19th century on binary quadratic forms $ax^2 + bxy + cy^2$ was full of linear changes of variables plugged into each other (especially linear changes of variable that we would recognize as coming from $rm SL_2(mathbf Z)$). For more on the background, see the paper by Thomas Hawkins on matrix theory in the 1974 ICM. Google "ICM 1974 Thomas Hawkins" and you'll find his paper among the top 3 hits.






        share|cite|improve this answer
























          up vote
          63
          down vote



          accepted







          up vote
          63
          down vote



          accepted






          Matrix multiplication is a symbolic way of substituting one linear change of variables into another one. If $x' = ax + by$ and $y' = cx+dy$, and
          $x'' = a'x' + b'y'$ and $y'' = c'x' + d'y'$ then we can plug the first pair of formulas into the second to express $x''$ and $y''$ in terms of $x$ and $y$:
          $$
          x'' = a'x' + b'y' = a'(ax + by) + b'(cx+dy) = (a'a + b'c)x + (a'b + b'd)y
          $$
          and
          $$
          y'' = c'x' + d'y' = c'(ax+by) + d'(cx+dy) = (c'a+d'c)x + (c'b+d'd)y.
          $$
          It can be tedious to keep writing the variables, so we use arrays to track the coefficients, with the formulas for $x'$ and $x''$ on the first row and for $y'$ and $y''$ on the second row. The above two linear substitutions coincide with the matrix product
          $$
          left(
          beginarraycc
          a'&b'\c'&d'
          endarray
          right)
          left(
          beginarraycc
          a&b\c&d
          endarray
          right)
          =
          left(
          beginarraycc
          a'a+b'c&a'b+b'd\c'a+d'c&c'b+d'd
          endarray
          right).
          $$
          So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another (order matters). The formulas are not intuitive, but it's nothing other than the simple idea of combining two linear changes of variables in succession.



          Matrix multiplication was first defined explicitly in print by Cayley in 1858, in order to reflect the effect of composition of linear transformations. See paragraph 3 at http://darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html. However, the idea of tracking what happens to coefficients when one linear change of variables is substituted into another (which we view as matrix multiplication) goes back further. For instance, the work of number theorists in the early 19th century on binary quadratic forms $ax^2 + bxy + cy^2$ was full of linear changes of variables plugged into each other (especially linear changes of variable that we would recognize as coming from $rm SL_2(mathbf Z)$). For more on the background, see the paper by Thomas Hawkins on matrix theory in the 1974 ICM. Google "ICM 1974 Thomas Hawkins" and you'll find his paper among the top 3 hits.






          share|cite|improve this answer














          Matrix multiplication is a symbolic way of substituting one linear change of variables into another one. If $x' = ax + by$ and $y' = cx+dy$, and
          $x'' = a'x' + b'y'$ and $y'' = c'x' + d'y'$ then we can plug the first pair of formulas into the second to express $x''$ and $y''$ in terms of $x$ and $y$:
          $$
          x'' = a'x' + b'y' = a'(ax + by) + b'(cx+dy) = (a'a + b'c)x + (a'b + b'd)y
          $$
          and
          $$
          y'' = c'x' + d'y' = c'(ax+by) + d'(cx+dy) = (c'a+d'c)x + (c'b+d'd)y.
          $$
          It can be tedious to keep writing the variables, so we use arrays to track the coefficients, with the formulas for $x'$ and $x''$ on the first row and for $y'$ and $y''$ on the second row. The above two linear substitutions coincide with the matrix product
          $$
          left(
          beginarraycc
          a'&b'\c'&d'
          endarray
          right)
          left(
          beginarraycc
          a&b\c&d
          endarray
          right)
          =
          left(
          beginarraycc
          a'a+b'c&a'b+b'd\c'a+d'c&c'b+d'd
          endarray
          right).
          $$
          So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another (order matters). The formulas are not intuitive, but it's nothing other than the simple idea of combining two linear changes of variables in succession.



          Matrix multiplication was first defined explicitly in print by Cayley in 1858, in order to reflect the effect of composition of linear transformations. See paragraph 3 at http://darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html. However, the idea of tracking what happens to coefficients when one linear change of variables is substituted into another (which we view as matrix multiplication) goes back further. For instance, the work of number theorists in the early 19th century on binary quadratic forms $ax^2 + bxy + cy^2$ was full of linear changes of variables plugged into each other (especially linear changes of variable that we would recognize as coming from $rm SL_2(mathbf Z)$). For more on the background, see the paper by Thomas Hawkins on matrix theory in the 1974 ICM. Google "ICM 1974 Thomas Hawkins" and you'll find his paper among the top 3 hits.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jun 2 '16 at 18:26









          msh210

          1,96311431




          1,96311431










          answered Jan 7 '13 at 4:31









          KCd

          16.3k3872




          16.3k3872




















              up vote
              26
              down vote













              Here is an answer directly reflecting the historical perspective from the paper Memoir on the theory of matrices By Authur Cayley, 1857. This paper is available here.



              This paper is credited with "containing the first abstract definition of a matrix" and "a matrix algebra defining addition, multiplication, scalar multiplication and inverses" (source).



              In this paper a nonstandard notation is used. I will do my best to place it in a more "modern" (but still nonstandard) notation. The bulk of the contents of this post will come from pages 20-21.



              To introduce notation, $$ (X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z)$$



              will represent the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ which are then called $(X,Y,Z)$.



              Cayley defines addition and scalar multiplication and then moves to matrix multiplication or "composition". He specifically wants to deal with the issue of:



              $$(X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z) quad textwhere quad (x,y,z)= left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)(xi,eta,zeta)$$



              He now wants to represent $(X,Y,Z)$ in terms of $(xi,eta,zeta)$. He does this by creating another matrix that satisfies the equation:



              $$(X,Y,Z)= left( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right)(xi,eta,zeta)$$



              He continues to write that the value we obtain is:



              $$beginalignleft( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right) &= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)\[.25cm] &= left( beginarrayccc
              aalpha+balpha' + calpha'' & abeta+bbeta' + cbeta'' & agamma+bgamma' + cgamma'' \
              a'alpha+b'alpha' + c'alpha'' & a'beta+b'beta' + c'beta'' & a'gamma+b'gamma' + c'gamma'' \
              a''alpha+b''alpha' + c''alpha'' & a''beta+b''beta' + c''beta'' & a''gamma+b''gamma' + c''gamma''endarray right)endalign$$



              This is the standard definition of matrix multiplication. I must believe that matrix multiplication was defined to deal with this specific problem. The paper continues to mention several properties of matrix multiplication such as non-commutativity, composition with unity and zero and exponentiation.



              Here is the written rule of composition:




              Any line of the compound matrix is obtained by combining the corresponding line of the first component matrix successively with the several columns of the second matrix (p. 21)







              share|cite|improve this answer
















              • 1




                Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
                – Vilhelm Gray
                May 11 '17 at 18:11










              • Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
                – Trunk
                Dec 28 '17 at 15:57















              up vote
              26
              down vote













              Here is an answer directly reflecting the historical perspective from the paper Memoir on the theory of matrices By Authur Cayley, 1857. This paper is available here.



              This paper is credited with "containing the first abstract definition of a matrix" and "a matrix algebra defining addition, multiplication, scalar multiplication and inverses" (source).



              In this paper a nonstandard notation is used. I will do my best to place it in a more "modern" (but still nonstandard) notation. The bulk of the contents of this post will come from pages 20-21.



              To introduce notation, $$ (X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z)$$



              will represent the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ which are then called $(X,Y,Z)$.



              Cayley defines addition and scalar multiplication and then moves to matrix multiplication or "composition". He specifically wants to deal with the issue of:



              $$(X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z) quad textwhere quad (x,y,z)= left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)(xi,eta,zeta)$$



              He now wants to represent $(X,Y,Z)$ in terms of $(xi,eta,zeta)$. He does this by creating another matrix that satisfies the equation:



              $$(X,Y,Z)= left( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right)(xi,eta,zeta)$$



              He continues to write that the value we obtain is:



              $$beginalignleft( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right) &= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)\[.25cm] &= left( beginarrayccc
              aalpha+balpha' + calpha'' & abeta+bbeta' + cbeta'' & agamma+bgamma' + cgamma'' \
              a'alpha+b'alpha' + c'alpha'' & a'beta+b'beta' + c'beta'' & a'gamma+b'gamma' + c'gamma'' \
              a''alpha+b''alpha' + c''alpha'' & a''beta+b''beta' + c''beta'' & a''gamma+b''gamma' + c''gamma''endarray right)endalign$$



              This is the standard definition of matrix multiplication. I must believe that matrix multiplication was defined to deal with this specific problem. The paper continues to mention several properties of matrix multiplication such as non-commutativity, composition with unity and zero and exponentiation.



              Here is the written rule of composition:




              Any line of the compound matrix is obtained by combining the corresponding line of the first component matrix successively with the several columns of the second matrix (p. 21)







              share|cite|improve this answer
















              • 1




                Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
                – Vilhelm Gray
                May 11 '17 at 18:11










              • Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
                – Trunk
                Dec 28 '17 at 15:57













              up vote
              26
              down vote










              up vote
              26
              down vote









              Here is an answer directly reflecting the historical perspective from the paper Memoir on the theory of matrices By Authur Cayley, 1857. This paper is available here.



              This paper is credited with "containing the first abstract definition of a matrix" and "a matrix algebra defining addition, multiplication, scalar multiplication and inverses" (source).



              In this paper a nonstandard notation is used. I will do my best to place it in a more "modern" (but still nonstandard) notation. The bulk of the contents of this post will come from pages 20-21.



              To introduce notation, $$ (X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z)$$



              will represent the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ which are then called $(X,Y,Z)$.



              Cayley defines addition and scalar multiplication and then moves to matrix multiplication or "composition". He specifically wants to deal with the issue of:



              $$(X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z) quad textwhere quad (x,y,z)= left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)(xi,eta,zeta)$$



              He now wants to represent $(X,Y,Z)$ in terms of $(xi,eta,zeta)$. He does this by creating another matrix that satisfies the equation:



              $$(X,Y,Z)= left( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right)(xi,eta,zeta)$$



              He continues to write that the value we obtain is:



              $$beginalignleft( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right) &= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)\[.25cm] &= left( beginarrayccc
              aalpha+balpha' + calpha'' & abeta+bbeta' + cbeta'' & agamma+bgamma' + cgamma'' \
              a'alpha+b'alpha' + c'alpha'' & a'beta+b'beta' + c'beta'' & a'gamma+b'gamma' + c'gamma'' \
              a''alpha+b''alpha' + c''alpha'' & a''beta+b''beta' + c''beta'' & a''gamma+b''gamma' + c''gamma''endarray right)endalign$$



              This is the standard definition of matrix multiplication. I must believe that matrix multiplication was defined to deal with this specific problem. The paper continues to mention several properties of matrix multiplication such as non-commutativity, composition with unity and zero and exponentiation.



              Here is the written rule of composition:




              Any line of the compound matrix is obtained by combining the corresponding line of the first component matrix successively with the several columns of the second matrix (p. 21)







              share|cite|improve this answer












              Here is an answer directly reflecting the historical perspective from the paper Memoir on the theory of matrices By Authur Cayley, 1857. This paper is available here.



              This paper is credited with "containing the first abstract definition of a matrix" and "a matrix algebra defining addition, multiplication, scalar multiplication and inverses" (source).



              In this paper a nonstandard notation is used. I will do my best to place it in a more "modern" (but still nonstandard) notation. The bulk of the contents of this post will come from pages 20-21.



              To introduce notation, $$ (X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z)$$



              will represent the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ which are then called $(X,Y,Z)$.



              Cayley defines addition and scalar multiplication and then moves to matrix multiplication or "composition". He specifically wants to deal with the issue of:



              $$(X,Y,Z)= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)(x,y,z) quad textwhere quad (x,y,z)= left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)(xi,eta,zeta)$$



              He now wants to represent $(X,Y,Z)$ in terms of $(xi,eta,zeta)$. He does this by creating another matrix that satisfies the equation:



              $$(X,Y,Z)= left( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right)(xi,eta,zeta)$$



              He continues to write that the value we obtain is:



              $$beginalignleft( beginarrayccc
              A & B & C \
              A' & B' & C' \
              A'' & B'' & C'' \ endarray right) &= left( beginarrayccc
              a & b & c \
              a' & b' & c' \
              a'' & b'' & c'' endarray right)left( beginarrayccc
              alpha & beta & gamma \
              alpha' & beta' & gamma' \
              alpha'' & beta'' & gamma'' \ endarray right)\[.25cm] &= left( beginarrayccc
              aalpha+balpha' + calpha'' & abeta+bbeta' + cbeta'' & agamma+bgamma' + cgamma'' \
              a'alpha+b'alpha' + c'alpha'' & a'beta+b'beta' + c'beta'' & a'gamma+b'gamma' + c'gamma'' \
              a''alpha+b''alpha' + c''alpha'' & a''beta+b''beta' + c''beta'' & a''gamma+b''gamma' + c''gamma''endarray right)endalign$$



              This is the standard definition of matrix multiplication. I must believe that matrix multiplication was defined to deal with this specific problem. The paper continues to mention several properties of matrix multiplication such as non-commutativity, composition with unity and zero and exponentiation.



              Here is the written rule of composition:




              Any line of the compound matrix is obtained by combining the corresponding line of the first component matrix successively with the several columns of the second matrix (p. 21)








              share|cite|improve this answer












              share|cite|improve this answer



              share|cite|improve this answer










              answered Jun 2 '14 at 13:35









              Brad

              4,22021348




              4,22021348







              • 1




                Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
                – Vilhelm Gray
                May 11 '17 at 18:11










              • Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
                – Trunk
                Dec 28 '17 at 15:57













              • 1




                Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
                – Vilhelm Gray
                May 11 '17 at 18:11










              • Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
                – Trunk
                Dec 28 '17 at 15:57








              1




              1




              Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
              – Vilhelm Gray
              May 11 '17 at 18:11




              Should the set of linear functions $(ax + by + cz, a'z + b'y + c'z, a''z + b''y + c''z)$ be $(ax + by + cz, a'x + b'y + c'z, a''x + b''y + c''z)$?
              – Vilhelm Gray
              May 11 '17 at 18:11












              Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
              – Trunk
              Dec 28 '17 at 15:57





              Brad's is THE answer, I think. The eventual coefficients obtained from a double transformation of (x, y, z) require row elements of the left transform matrices to be multiplied by their corresponding column element of the right transform matrix. My own old idea was that if multiplication of a vector by a matrix is done like A (x, y, z) = (x', y', z'), i.e. using row vectors, then the resulting system will not "read" as clearly as if the (x, y, z) & (x', y', z') were written as a column vectors. And this is only for a 3-D system. Higher order systems harder still. But Brad's idea is dead-on.
              – Trunk
              Dec 28 '17 at 15:57











              up vote
              13
              down vote













              beginalign
              u & = 3x + 7y \ v & = -2x + 11y \ \ \ \
              p & =13u-20v \ q & = 2u+6v
              endalign
              Given $x$ and $y$, how do you find $p$ and $q$? How do you write:
              beginalign
              p & = bullet, x + bullet, y \ q & = bullet, x+bullet, yquadtext?
              endalign
              What numbers go where the four $bullet$'s are?



              That is what matrix multiplication is. The rationale is mathematical, not historical.






              share|cite|improve this answer


























                up vote
                13
                down vote













                beginalign
                u & = 3x + 7y \ v & = -2x + 11y \ \ \ \
                p & =13u-20v \ q & = 2u+6v
                endalign
                Given $x$ and $y$, how do you find $p$ and $q$? How do you write:
                beginalign
                p & = bullet, x + bullet, y \ q & = bullet, x+bullet, yquadtext?
                endalign
                What numbers go where the four $bullet$'s are?



                That is what matrix multiplication is. The rationale is mathematical, not historical.






                share|cite|improve this answer
























                  up vote
                  13
                  down vote










                  up vote
                  13
                  down vote









                  beginalign
                  u & = 3x + 7y \ v & = -2x + 11y \ \ \ \
                  p & =13u-20v \ q & = 2u+6v
                  endalign
                  Given $x$ and $y$, how do you find $p$ and $q$? How do you write:
                  beginalign
                  p & = bullet, x + bullet, y \ q & = bullet, x+bullet, yquadtext?
                  endalign
                  What numbers go where the four $bullet$'s are?



                  That is what matrix multiplication is. The rationale is mathematical, not historical.






                  share|cite|improve this answer














                  beginalign
                  u & = 3x + 7y \ v & = -2x + 11y \ \ \ \
                  p & =13u-20v \ q & = 2u+6v
                  endalign
                  Given $x$ and $y$, how do you find $p$ and $q$? How do you write:
                  beginalign
                  p & = bullet, x + bullet, y \ q & = bullet, x+bullet, yquadtext?
                  endalign
                  What numbers go where the four $bullet$'s are?



                  That is what matrix multiplication is. The rationale is mathematical, not historical.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Nov 6 '17 at 21:12

























                  answered Jan 7 '13 at 4:45









                  Michael Hardy

                  206k23187466




                  206k23187466



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f271927%2fwhy-historically-do-we-multiply-matrices-as-we-do%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      這個網誌中的熱門文章

                      How to combine Bézier curves to a surface?

                      Carbon dioxide

                      Why am i infinitely getting the same tweet with the Twitter Search API?