If we were to write a stand alone backprop function, it would take the derivative of loss wrt to the output activation as input and will have to calculate two values from it. First, will be the derivative of loss wrt the weights. This will be used in the gradient descent calculation to update the weights. Second, the function should calculate the derivative of loss wrt the input activation. This will have to be returned so as to continue with the backpropogation, as the input activation for this layer is nothing but the output activation of the previous layer.