Vector Sum Reduction

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I’m currently reading Jeremy Howard’s paper “The Matrix Calculus You Need For Deep Learning” and came across a bit I don’t understand.



In section 4.4 (Vector Sum Reduction), they are calculating the derivative of $y = operatornamesum(mathbff(mathbfx))$, like so:
beginalign*
fracpartial ypartial mathbfx
&= left[
fracpartial ypartial x_1,
fracpartial ypartial x_2,
dotsc,
fracpartial ypartial x_n
right] \
&= left[
fracpartialpartial x_1 sum_i f_i(mathbfx) ,
fracpartialpartial x_2 sum_i f_i(mathbfx),
dotsc,
fracpartialpartial x_n sum_i f_i(mathbfx)
right] \
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
endalign*
(Original image here.)



Here, the following is noted:




Notice we were careful here to leave the parameter as a vector $mathbfx$ because each function $f_i$ could use all values in the vector, not just $x_i$.




What does that mean? I thought it meant that we can’t reduce $f_i(mathbfx)$ to $f_i(x_i)$ to $x_i$, but right afterwards, that is precisely what they do:




Let’s look at the gradient of the simple $y = operatornamesum(mathbfx)$.
The function inside the summation is just $f_i(mathbfx) = x_i$ and the gradient is then:
beginalign*
nabla y
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
&= left[
sum_i fracpartial x_ipartial x_1,
sum_i fracpartial x_ipartial x_2,
dotsc,
sum_i fracpartial x_ipartial x_n
right].
endalign*



(Original image here.)




Does it have something to do with the summation being taken out? Or am I reading this completely wrong?







share|cite|improve this question


















  • 2




    It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
    – Lord Shark the Unknown
    Aug 24 at 5:03










  • Ohh got it! Thank you!
    – General Thalion
    Aug 25 at 8:42














up vote
1
down vote

favorite
1












I’m currently reading Jeremy Howard’s paper “The Matrix Calculus You Need For Deep Learning” and came across a bit I don’t understand.



In section 4.4 (Vector Sum Reduction), they are calculating the derivative of $y = operatornamesum(mathbff(mathbfx))$, like so:
beginalign*
fracpartial ypartial mathbfx
&= left[
fracpartial ypartial x_1,
fracpartial ypartial x_2,
dotsc,
fracpartial ypartial x_n
right] \
&= left[
fracpartialpartial x_1 sum_i f_i(mathbfx) ,
fracpartialpartial x_2 sum_i f_i(mathbfx),
dotsc,
fracpartialpartial x_n sum_i f_i(mathbfx)
right] \
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
endalign*
(Original image here.)



Here, the following is noted:




Notice we were careful here to leave the parameter as a vector $mathbfx$ because each function $f_i$ could use all values in the vector, not just $x_i$.




What does that mean? I thought it meant that we can’t reduce $f_i(mathbfx)$ to $f_i(x_i)$ to $x_i$, but right afterwards, that is precisely what they do:




Let’s look at the gradient of the simple $y = operatornamesum(mathbfx)$.
The function inside the summation is just $f_i(mathbfx) = x_i$ and the gradient is then:
beginalign*
nabla y
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
&= left[
sum_i fracpartial x_ipartial x_1,
sum_i fracpartial x_ipartial x_2,
dotsc,
sum_i fracpartial x_ipartial x_n
right].
endalign*



(Original image here.)




Does it have something to do with the summation being taken out? Or am I reading this completely wrong?







share|cite|improve this question


















  • 2




    It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
    – Lord Shark the Unknown
    Aug 24 at 5:03










  • Ohh got it! Thank you!
    – General Thalion
    Aug 25 at 8:42












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I’m currently reading Jeremy Howard’s paper “The Matrix Calculus You Need For Deep Learning” and came across a bit I don’t understand.



In section 4.4 (Vector Sum Reduction), they are calculating the derivative of $y = operatornamesum(mathbff(mathbfx))$, like so:
beginalign*
fracpartial ypartial mathbfx
&= left[
fracpartial ypartial x_1,
fracpartial ypartial x_2,
dotsc,
fracpartial ypartial x_n
right] \
&= left[
fracpartialpartial x_1 sum_i f_i(mathbfx) ,
fracpartialpartial x_2 sum_i f_i(mathbfx),
dotsc,
fracpartialpartial x_n sum_i f_i(mathbfx)
right] \
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
endalign*
(Original image here.)



Here, the following is noted:




Notice we were careful here to leave the parameter as a vector $mathbfx$ because each function $f_i$ could use all values in the vector, not just $x_i$.




What does that mean? I thought it meant that we can’t reduce $f_i(mathbfx)$ to $f_i(x_i)$ to $x_i$, but right afterwards, that is precisely what they do:




Let’s look at the gradient of the simple $y = operatornamesum(mathbfx)$.
The function inside the summation is just $f_i(mathbfx) = x_i$ and the gradient is then:
beginalign*
nabla y
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
&= left[
sum_i fracpartial x_ipartial x_1,
sum_i fracpartial x_ipartial x_2,
dotsc,
sum_i fracpartial x_ipartial x_n
right].
endalign*



(Original image here.)




Does it have something to do with the summation being taken out? Or am I reading this completely wrong?







share|cite|improve this question














I’m currently reading Jeremy Howard’s paper “The Matrix Calculus You Need For Deep Learning” and came across a bit I don’t understand.



In section 4.4 (Vector Sum Reduction), they are calculating the derivative of $y = operatornamesum(mathbff(mathbfx))$, like so:
beginalign*
fracpartial ypartial mathbfx
&= left[
fracpartial ypartial x_1,
fracpartial ypartial x_2,
dotsc,
fracpartial ypartial x_n
right] \
&= left[
fracpartialpartial x_1 sum_i f_i(mathbfx) ,
fracpartialpartial x_2 sum_i f_i(mathbfx),
dotsc,
fracpartialpartial x_n sum_i f_i(mathbfx)
right] \
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
endalign*
(Original image here.)



Here, the following is noted:




Notice we were careful here to leave the parameter as a vector $mathbfx$ because each function $f_i$ could use all values in the vector, not just $x_i$.




What does that mean? I thought it meant that we can’t reduce $f_i(mathbfx)$ to $f_i(x_i)$ to $x_i$, but right afterwards, that is precisely what they do:




Let’s look at the gradient of the simple $y = operatornamesum(mathbfx)$.
The function inside the summation is just $f_i(mathbfx) = x_i$ and the gradient is then:
beginalign*
nabla y
&= left[
sum_i fracpartial f_i(mathbfx)partial x_1,
sum_i fracpartial f_i(mathbfx)partial x_2,
dotsc,
sum_i fracpartial f_i(mathbfx)partial x_n
right] \
&= left[
sum_i fracpartial x_ipartial x_1,
sum_i fracpartial x_ipartial x_2,
dotsc,
sum_i fracpartial x_ipartial x_n
right].
endalign*



(Original image here.)




Does it have something to do with the summation being taken out? Or am I reading this completely wrong?









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 24 at 4:49









Jendrik Stelzner

7,57221037




7,57221037










asked Aug 24 at 4:36









General Thalion

62




62







  • 2




    It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
    – Lord Shark the Unknown
    Aug 24 at 5:03










  • Ohh got it! Thank you!
    – General Thalion
    Aug 25 at 8:42












  • 2




    It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
    – Lord Shark the Unknown
    Aug 24 at 5:03










  • Ohh got it! Thank you!
    – General Thalion
    Aug 25 at 8:42







2




2




It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
– Lord Shark the Unknown
Aug 24 at 5:03




It's reminding you that in general $f_i(textbf x)=f_i(x_1,x_2,ldots,x_n)$ is a function of all the variables $x_j$. Then they do a particular example where $f_i(x_1,x_2,ldots,x_n)=x_i$ just depends on the one variable.
– Lord Shark the Unknown
Aug 24 at 5:03












Ohh got it! Thank you!
– General Thalion
Aug 25 at 8:42




Ohh got it! Thank you!
– General Thalion
Aug 25 at 8:42















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2892779%2fvector-sum-reduction%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2892779%2fvector-sum-reduction%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Mutual Information Always Non-negative

Why am i infinitely getting the same tweet with the Twitter Search API?