Stochastic Gradient Descent is preceded by Stochastic Approximation as first described by Robbins and Monro in their paper, A Stochastic Approximation Method. Kiefer and Wolfowitz subsequently published their paper, Stochastic Estimation of the Maximum of a Regression Function which is more recognizable to people familiar with the ML variant of Stochastic Approximation (i.e Stochastic Gradient Descent), as pointed out by Mark Stone in the comments. The 60's saw plenty of research along that vein -- Dvoretzky, Powell, Blum all published results that we take for granted today. It is a relatively minor leap to get from the Robbins and Monro method to the Kiefer Wolfowitz method, and merely a reframing of the problem to then get to Stochastic Gradient Descent (for regression problems). The above papers are widely cited as being the antecedents of Stochastic Gradient Descent, as mentioned in this review paper by Nocedal, Bottou, and Curtis, which provides a brief historical perspective from a Machine Learning point of view.
I believe that Kushner and Yin in their book Stochastic Approximation and Recursive Algorithms and Applications suggest that the notion had been used in control theory as far back as the 40's, but I don't recall if they had a citation for that or if it was anecdotal, nor do I have access to their book to confirm this.
Herbert Robbins and Sutton Monro A Stochastic Approximation Method
The Annals of Mathematical Statistics, Vol. 22, No. 3. (Sep., 1951), pp. 400-407.
J. Kiefer and J. Wolfowitz Stochastic Estimation of the Maximum of a Regression Function Ann. Math. Statist. Volume 23, Number 3 (1952), 462-466
Leon Bottou and Frank E. Curtis and Jorge Nocedal Optimization Methods for Large-Scale Machine Learning, Technical Report, arXiv:1606.04838