학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Best k-Layer Neural Network Approximations

Resource Type: Original Paper
Authors: Lim, Lek-Heng; Michałek, Mateusz; Qi, Yang
Source: Constructive Approximation. 55(1):583-604
Subject: Neural network
Best approximation
Join loci
Secant loci
92B20
41A50
41A30
Language: English
ISSN: 0176-4276
1432-0940

Online Access

초록

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n with corresponding responses s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, fitting a k-layer neural network s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n involves estimation of the weights s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n via an ERM: s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,nWe show that even for s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize such a loss function in the event when its infimum is not attainable, it necessarily results in values of s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n diverging to s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n. We will show that for smooth activations s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n and s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, we completely classify cases where the ERM for a best two-layer neural network approximation attains its infimum. In recent applications of neural networks, where overfitting is commonplace, the failure to attain an infimum is avoided by ensuring that the system of equations s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, s1,…,sn∈Rpt1,…,tn∈Rqνθ:Rp→Rqθ∈Rminfθ∈Rm∑i=1n‖ti-νθ(si)‖22.k=2θ±∞σ(x)=1/(1+exp(-x))σ(x)=tanh(x)σ(x)=max(0,x)ti=νθ(si)i=1,…,n, has a solution. For a two-layer ReLU-activated network, we will show when such a system of equations has a solution generically, i.e., when can such a neural network be fitted perfectly with probability one.

공지

DAU Library

학술논문

요약정보

Best k-Layer Neural Network Approximations

Online Access

초록