OperatorLearning.DeepONet
— TypeDeepONet(architecture_branch::Tuple, architecture_trunk::Tuple, act_branch = identity, act_trunk = identity; init_branch = Flux.glorot_uniform, init_trunk = Flux.glorot_uniform, bias_branch=true, bias_trunk=true)
DeepONet(branch_net::Flux.Chain, trunk_net::Flux.Chain)
Create an (unstacked) DeepONet architecture as proposed by Lu et al. arXiv:1910.03193
The model works as follows:
x –- branch – | -⊠–u- | y –- trunk –-
Where x
represents the input function, discretely evaluated at its respective sensors. So the ipnut is of shape [m] for one instance or [m x b] for a training set. y
are the probing locations for the operator to be trained. It has shape [N x n] for N different variables in the PDE (i.e. spatial and temporal coordinates) with each n distinct evaluation points. u
is the solution of the queried instance of the PDE, given by the specific choice of parameters.
Both inputs x
and y
are multiplied together via dot product Σᵢ bᵢⱼ tᵢₖ.
You can set up this architecture in two ways:
By Specifying the architecture and all its parameters as given above. This always creates
Dense
layers for the branch and trunk net and corresponds to the DeepONet proposed by Lu et al.By passing two architectures in the form of two Chain structs directly. Do this if you want more flexibility and e.g. use an RNN or CNN instead of simple
Dense
layers.
Strictly speaking, DeepONet does not imply either of the branch or trunk net to be a simple DNN. Usually though, this is the case which is why it's treated as the default case here.
Example
Consider a transient 1D advection problem ∂ₜu + u ⋅ ∇u = 0, with an IC u(x,0) = g(x). We are given several (b = 200) instances of the IC, discretized at 50 points each and want to query the solution for 100 different locations and times [0;1].
That makes the branch input of shape [50 x 200] and the trunk input of shape [2 x 100]. So the input for the branch net is 50 and 100 for the trunk net.
Usage
julia> model = DeepONet((32,64,72), (24,64,72))
DeepONet with
branch net: (Chain(Dense(32, 64), Dense(64, 72)))
Trunk net: (Chain(Dense(24, 64), Dense(64, 72)))
julia> model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
DeepONet with
branch net: (Chain(Dense(32, 64, σ), Dense(64, 72, σ)))
Trunk net: (Chain(Dense(24, 64, tanh; bias=false), Dense(64, 72, tanh; bias=false)))
julia> branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
Chain(
Dense(2, 128), # 384 parameters
Dense(128, 64), # 8_256 parameters
Dense(64, 72), # 4_680 parameters
) # Total: 6 arrays, 13_320 parameters, 52.406 KiB.
julia> trunk = Chain(Dense(1,24),Dense(24,72))
Chain(
Dense(1, 24), # 48 parameters
Dense(24, 72), # 1_800 parameters
) # Total: 4 arrays, 1_848 parameters, 7.469 KiB.
julia> model = DeepONet(branch,trunk)
DeepONet with
branch net: (Chain(Dense(2, 128), Dense(128, 64), Dense(64, 72)))
Trunk net: (Chain(Dense(1, 24), Dense(24, 72)))
OperatorLearning.FourierLayer
— TypeFourierLayer(in, out, grid, modes, σ=identity, init=glorot_uniform)
FourierLayer(Wf::AbstractArray, Wl::AbstractArray, [bias_f, bias_l, σ])
Create a Layer of the Fourier Neural Operator as proposed by Li et al. arXiv: 2010.08895
The layer does a fourier transform on the grid dimension of the input array, filters higher modes out by the weight matrix and transforms it to the specified output dimension such that In x M x N -> Out x M x N. The output though only contains the relevant Fourier modes with the rest padded to zero in the last axis as a result of the filtering.
The input x
should be a rank 3 tensor of shape (num parameters (in
) x num grid points (grid
) x batch size (batch
)) The output y
will be a rank 3 tensor of shape (out
x num grid points (grid
) x batch size (batch
))
You can specify biases for the paths as you like, though the convolutional path is originally not intended to perform an affine transformation.
Examples
Say you're considering a 1D diffusion problem on a 64 point grid. The input is comprised of the grid points as well as the IC at this point. The data consists of 200 instances of the solution. Beforehand we convert the two input channels into a higher-dimensional latent space with 128 nodes by using a regular Dense
layer. So the input takes the dimension 128 x 64 x 200
. The output would be the diffused variable at a later time, which initially makes the output of the form 128 x 64 x 200
as well. Finally, we have to squeeze this high-dimensional ouptut into the one quantity of interest again by using a Dense
layer.
We wish to only keep the first 16 modes of the input and work with the classic sigmoid function as activation.
So we would have:
model = FourierLayer(128, 128, 100, 16, σ)
OperatorLearning.cglorot_normal
— Methodcglorotnormal([rng=GLOBALRNG], dims...)
A modification of the glorot_normal
function provided by Flux
to accommodate Complex numbers. This is necessary since the parameters of the global convolution operator in the Fourier Layer generally has complex weights.
OperatorLearning.cglorot_uniform
— Methodcglorotuniform([rng=GLOBALRNG], dims...)
A modification of the glorot_uniform
function provided by Flux
to accommodate Complex numbers. This is necessary since the parameters of the global convolution operator in the Fourier Layer generally has complex weights.
OperatorLearning.construct_subnet
— FunctionConstruct a Chain of Dense
layers from a given tuple of integers.
Input: A tuple (m,n,o,p) of integer type numbers that each describe the width of the i-th Dense layer to Construct
Output: A Flux
Chain with length of the input tuple and individual width given by the tuple elements
Example
julia> model = OperatorLearning.construct_subnet((2,128,64,32,1))
Chain(
Dense(2, 128), # 384 parameters
Dense(128, 64), # 8_256 parameters
Dense(64, 32), # 2_080 parameters
Dense(32, 1), # 33 parameters
) # Total: 8 arrays, 10_753 parameters, 42.504 KiB.
julia> model([2,1])
1-element Vector{Float32}:
-0.7630446