Multitask deep-learning-based design of chiral plasmonic metamaterials Download: 621次
1. INTRODUCTION
Chiral nanostructures are non-superimposable to their mirror image, and produce a different optical response for left circularly polarized (LCP) and right circularly polarized (RCP) light [13" target="_self" style="display: inline;">–
Deep-learning (DL) as a data-driven technique for analysis and prediction has permeated several disciplines such as natural language processing [13], image recognition [14], genetics, and biology [15,16]. Instead of generating simulation results by running through predefined systems of equations in a given geometry, a DL architecture can be trained to recognize patterns in a given dataset, identify attributes, and predict responses, thanks to its capability to reproduce arbitrarily complex functions. As a type of representation or feature learning, DL brings machine learning (ML) closer to artificial intelligence where human-like and exceedingly challenging tasks can be completed by trained systems. A typical example of the capabilities of this technique is the success of deep neural networks with reinforcement learning in playing games such as shogi, chess, and Go [1719" target="_self" style="display: inline;">–
In this communication, we propose an end-to-end multitask DL (MDL)-based model for the design and optimization of 3D chiral metamaterials. MDL models have gained root in the computational study of semantics [32] and transportation [33,34], as well as pose and action recognition [35]. Multitask learning draws on its implicit data augmentation, eavesdropping, attention focusing, representation bias, and regularization for effective and efficient generalization [36,37], eliminating the need for auxiliary networks or equivalent approaches to stabilize the model’s output of physically relevant information.
Our MDL model comprises a single bidirectional neural network solving two tasks: the accurate prediction of the full chiroptical response of a chiral metamaterial from a set of geometric parameters, via a forward prediction path, and the accurate retrieval of the geometric parameters that can produce a given input of a full chiroptical response, by solving the inverse problem via an inverse prediction path. To bridge the mismatch gap and enhance the prediction accuracy for both forward and inverse predictions, especially at CD and plasmonic resonances, a joint-learning feature is incorporated in the model training. This feature ensures the comparison of errors in the learning of tasks, allowing for a well-generalized system. Consequently, the MDL model ensures an efficient use of the training set to achieve faster convergence [38,39] and provides a practical guideline for implementing similar ML systems in a variety of design problems with nanotechnological applications. This work is organized in three major parts. The first concerns the description of the chiral metamaterial absorber modeled after the yin-yang symbol and its optical properties calculated by the finite element method (FEM). The second contains the discussion of the MDL model composed by the forward and inverse design paths (FDP and IDP, respectively). Finally, in the third, we apply a sample MDL-optimized structure in the application of sensing biomolecular enantiomers. We envisage that our work will inspire the use of ML as an effective and efficient data-driven metamaterial design and optimization tool, taking advantage of the multitask technique that we detail herein.
2. CHIRAL METAMATERIAL MODEL AND FORMALISM
Generally, the metamaterial absorber structure used in this study consists of top yin-yang-shaped Au nanoparticles (YNPs), a PMMA layer, a Au backreflector, and a bottom glass layer. Figure
Fig. 1. Schematic of (a) a single YNP chiral meta-absorber array with definition of incident circularly polarized lights and a unit cell with dimensions. (b) Three single YNP metastructure configurations: Au YNP/Glass (YG), Au YNP/PMMA/Glass (YPG), and Au YNP/PMMA/Au/Glass (YPAG). (c) Absorption and CD spectra of the three metastructure configurations (YG, YPG, and YPAG), showing their plasmonic resonances ( , 625 nm, and 645 nm, respectively) and revealing a strong chiroptical response for the metamaterial absorber case (YPAG). Here, , , , , and .
To illustrate the relevance of the multilayered structure on the intensity of the fields and chiroptical response under resonance, we consider single YNP under three structures of comparable dimensions: YNP/Glass (YG), YNP/PMMA/Glass (YPG), and YNP/PMMA/Au/Glass (YPAG). Figure
The enhanced chiral field and CD response in the metamaterial absorber case make it ideally competitive for chirality-related applications. The base YNP structure can be modified to generate other complex chiral designs, i.e., dimers and trimers (see Appendix
Fig. 2. Schematic of the bidirectional multitask deep-learning model for chiral metamaterial design consisting of forward design path (FDP) and inverse design path (IDP). Each path is composed by shared layers and task specific layers with joint optimization functionality. The model is set up in an end-to-end fashion where the geometric design parameters, CD, and LCP/RCP absorption spectra can be treated as input or output at specific ports. Here, the geometric design parameters are the YNPs thickness, PMMA thickness, YNPs radius, and YNPs (respectively represented as , ). has been taken as a constant in the data shown in this study, but is nonetheless included in the model to represent the general parametrization of the system. The inset shows the metamaterial absorber geometry used to exemplify the use of the MDL.
3. MULTITASK DEEP-LEARNING ARCHITECTURE
As depicted in Fig.
3.2 A. Forward Design
In the forward prediction path, the varying dataset scales across the input design parameters, absorption, and CD responses would make the model data-scale-dependent if trained directly, resulting in poor generalization. To eliminate the effect of the varying input length scales on the generalization of the model, we employ a normalization layer following the relation where and index the row and column, respectively, such that is the th column of the input parameter matrix. and are the mean and standard deviation, respectively.
The normalization provides a well-conditioned dataset for optimization by ensuring that the training is less sensitive to the scale of features to be processed by the shared layer [59]. At the shared layer, we adopt a hard parameter sharing of four hidden layers, each with 1024 nodes. These hidden layers are shared between all the individual task-specific output layers of the network. The shared hidden layers hold computational weights from the task-specific layers. That is, the CD learning leverages the LCP/RCP spectra task-specific learning to enhance accuracy.
We regularize all hidden layers by applying penalties on layer activity during optimization with an regularizer in order to learn sparse features and internal representations of raw observations [60]. The task-specific layer consists of two independent parts: the main task and auxiliary tasks. The auxiliary tasks are subtasks expected to assist in finding rigorous, rich, and robust representation of the input design parameters to benefit the main task. Learning auxiliary tasks restrict the parameter space during optimization and push for a faster convergence. The main task, which characterizes the desired output response, exploits and jointly learns from the auxiliary tasks via the shared layer. The main and auxiliary tasks correspondingly generate three single-task losses.
To optimize the MDL network, the joint multitask cost function comprising the three single-task losses is minimized. Here, the principal multitask cost function, subject to optimization, is expressed as [61] where and index the training set and the three learning tasks, respectively. refers to the simulated outputs from the three tasks (CD and LCP/RCP absorption signals). The model function takes as input , which is the design parameter matrix comprising the YNP radius , the gap distance d, the YNP thickness , the polymer thickness , and the Au backreflector thickness with weight, . The definition of is illustrated by the inset in Fig.
Fig. 3. MDL model performance. (a) Numerical simulation and (b) MDL prediction CD results of the dimer structure at varying gap distance (50–160 nm), across the visible and near-IR regime. , , , and . (c) Numerical simulation and (d) MDL prediction results of the dimer at varying YNP thickness ( ). The color legend has been truncated at for clarity. Inset, definition of the gap d . (e) Learning curve within 3000 epochs. (f) Discretized model performance at selected = {60 nm, 80 nm, 100 nm} corresponding to the horizontal dots in (c) and (d). (g) Model performance at across varying , corresponding to the vertical short dashes through (c) and (d). Here, , , , and .
We adopt the mean squared error (MSE)—the quadratic loss, which is the sum of squared distances between our target variable (simulated CD) and predicted CD values—to average the losses over the output. The Adam moment estimation stochastic optimization approach is used to compute an adaptive learning rate for each of the internal parameters of the model [62]. It is typical of ML models to miss the resonances of datasets with high volatility during prediction. This is because the probability distribution is centered at the off-resonance for each neuron in the output layer, neglecting the local optima. The joint loss optimization functionality enables collective error correction for the forward prediction task, allowing an accurate prediction at the local optima. The training set utilized of the 640 collected samples with the remaining as the validation set. Each sample is constituted by the full LCP absorption, RCP absorption, and CD spectra data points. These 241 data points were generated at 5 nm step intervals within the 400–1600 nm wavelengths. The MSE recorded is 0.000441, which is indicative of the model’s accuracy. After training, we use the validation dataset, which is unseen throughout the training, to evaluate the model. Within short prediction intervals, the MDL model exhibits prediction results comparable to simulation data. A comparison between simulated [Fig.
Given the design parameter space of the structure, vast sets of CD responses from varied parameter configurations can be retrieved across the visible and near-IR regime () via the trained MDL model. Figure
Fig. 4. MDL-predicted CD progressions. (a) CD evolution by varying YNP radius at = {100 nm, 150 nm, 200 nm}. CD map plot by varying concurrently, (b) YNP radius and polymer thickness at = {5 nm, 25 nm, 50 nm} for , (c) YNP thickness and polymer thickness at = {100 nm, 150 nm, 200 nm} and , and (d) YNP radius and YNP thickness at = {10 nm, 45 nm, 100 nm} and . The color legend has been truncated at for clarity, but high-CD areas have been highlighted by adding the contour regions corresponding to CD values of 0.5.
Figure
Having the trained network, we can easily extend its output to cover many more geometric parameter combinations, and for different wavelengths, although this sample should suffice for illustrating the method’s capabilities, as well as obtaining a qualitative understanding of the physical system.
It is relevant to underscore a point of particular interest of employing such an MDL approach, which is that it requires a much smaller amount of computational resources than producing an equivalently dense dataset through traditional simulation methods. Let us examine a quick estimate for the comparison of both approaches. For each combination of geometric parameters, it takes a typical i7-CPU PC (which, for simplicity, we will assume that consumes the total output of its 330 W power source regardless of the task performed) more than 3 h to compute a full spectrum when using COMSOL with precision levels adequate for our system. This order of magnitude for the duration of the computation is also representative of other simulation packages and frameworks. In our case, the total energy used for creating the model (combining the energy required to generate the training set with COMSOL and actually training the network) is approximately 0.63 MWh spanning 80 days using the above example PC. After training, using the MDL model requires less than a second to produce the response for a given geometric parameter set. On the other hand, for the five geometric parameters and our chosen sampling density, the total number of simulations required to obtain the sample result density that the trained MDL affords would be in the tens of thousands ( samples), which would require an outrageous number of years ( years) to produce, with a total energy consumption of approximately 29 GWh using the example PC. Therefore, in this (admittedly extreme) comparison, the MDL approach would outperform a naïve simulation-based approach by 7 orders of magnitude in terms of speed and energy expenditure.
3.3 B. Inverse Design
After obtaining a high prediction accuracy in the forward path, we proceed to design an IDP. It is important to realize that this task is very complex and sensitive, because of the large imbalance between the input and output dimensions of the model: five geometric parameters compared with the full CD, LCP, and RCP spectra ( versus , respectively). While it is plausible to apply up- and down-sampling approaches to resolve this mismatch, essential features may be lost in the process, especially in the case of complex structures like the YNP. Such lost features introduce prediction errors that compound over a training loop, resulting in wide variations in the retrieved geometric parameters for comparatively small changes in the input spectra. Reverse-engineering the multitask forward design path makes use of the entire set of CD data points and maps each output geometric parameter to the desired full input spectra. The IDP takes a simulated CD spectrum as input, with the objective of retrieving the geometric parameters required to produce it. To achieve this objective, we connect the input spectrum to three isolated dense networks with two layers, responsible for mapping the chiroptical response to a shared latent-space representation with four layers, each composed by 2048 neurons. The shared latent space links to five task-specific layers with each layer managing the output of a designated geometric parameter (see Fig.
Running this inverter network gives us a set of geometric parameters, which will be close to the ones generating the target CD spectrum. However, given the interdependency of these parameters and the high sensitivity of the CD response to geometric parameters [see, for instance, Fig.
Fig. 5. Inverse design with the MDL model. (a), (b) Simulated (green solid lines) and predicted (red dots) CD spectra. (c), (d) Corresponding simulated (green bars) and retrieved (red bars) geometric parameters. Red dots in (a), (b) are predicted from the MDL model with geometric parameters retrieved [red bar in (c), (d)] for the target simulated CD spectra in (a), (b). [See Figs. 9(a) and 9(b) in Appendix C for absorption spectra comparison.]
Figures
4. CHIRAL BIOSENSING
The strength of the chiroptical response from biomolecules is mostly limited by the small magnitude of their geometric chiral features, relative to the periodicity of CPL, in comparison with chiral plasmonic nanoantennas. Therefore, it is often necessary to use large molecular concentrations of nonracemic mixtures to provide detectable CD signals [5,63,64]. Moreover, chiral biomolecular structures have chiroptical activity in the UV and are thus difficult to detect with common instrumentation. Figure
Fig. 6. Enantiomer detection. (a) The CD spectra of (red) left-handed medium ( ), and (blue) right-handed medium ( ) with molecular CD resonance ( ) in the UV. (b) CD spectra comparison of the right-handed chiral metamaterial absorber ( ) with (blue) and without (red) chiral medium (CM). Inset is the electric field at the plasmonic resonance, ( ), of the bare chiral metamaterial absorber for LCP and RCP light. (c) CD summation to remove metamaterial background CD signal to reveal the LH (blue solid line) and RH (green solid line) enantiomer pair CD signals. , , and represent the resonant wavelengths for the CD of the bare molecules, the plasmonic chiral metamaterial absorber, and the metamaterial covered with chiral media, respectively. Inset, schematic representation of an enantiomeric protein molecular pair ( and isomers). (d) Electric field, surface charge density, and optical chirality density distributions of the chiral metamaterial absorber covered by chiral media (CM) at .
In what follows, we present theoretical results illustrating the behavior of one of the ML-optimized nano-dimer metastructures in the presence of molecular enantiomers, using the excess CD method to quantify the chirality of the molecular sample [70]. The structure was optimized for value of CD of 0.35. The geometric parameters of the metamaterial are , , , , , and . Modeling the chiral molecules, the dimer is covered by a 40 nm thick chiral medium with 1.6 refractive index.
The chiral dielectric medium is modeled following the constitutive equations: Here, and are the permittivity of free space and relative permittivity, respectively. Similarly, and are the permeability of free space and relative permeability, respectively. and are the complex electric field and magnetic flux density, respectively. and are the electric field displacement and the magnetic field, respectively. is the chirality factor of the molecular sample, which shows very low values for low-density or near-racemic samples. Using a two-state model for the molecules, can be expressed as a function of frequency as where controls the magnitude of the chiral asymmetry, is the angular frequency of the radiation, at the molecular excitation wavelength, , and defines the relaxation rate of the excited molecule, with its indices describing its quantum states. Here, and . The expression and values for the chirality factor, , are adopted from Govorov et al. [5,9] and follow from the quantum equation of motion for the electronic density matrix when assuming a dilute molecular sample.
Now, we proceed with the excess CD method to compute the chiral properties of the molecules, when in interaction with left-handed and right-handed chiral metamaterial ( and , respectively). First, a baseline CD is calculated from the metamaterial absorber with racemate () molecular coverage. In the process, a distinguishable dielectric medium-induced CD redshift () is observed as illustrated in Fig.
Finishing this section, we present additional details on the chiral properties of the example metastructure, showing its near-field enhancement, surface charge density, and optical chirality parameter . The latter is calculated, for harmonically oscillating fields, as [71] Figure
5. CONCLUSION
We propose a highly portable and functional MDL model to comprehensively study 3D, arbitrarily complex chiral metamaterials, and exemplify its usage with a chiral metamaterial designed after the yin-yang symbol. The model is composed by a single end-to-end bidirectional architecture, capable of performing optimization and inverse retrieval operations, and that takes advantage of the supporting role of two auxiliary tasks to facilitate the learning of the primary task, CD in our case. This feature in particular distinguishes it from other approaches discussed in the literature, and represents a reduction of the complexity of the DL framework while ensuring an efficient use of the training set toward a highly generalized system. As a data-driven approach, this MDL model requires a prior database of results created with methods such as FEM simulations or experimental data, but it helps in avoiding the huge computational cost that would be required to explore the vast design space of the physical system in fine detail. An additional advantage resides in the fact that a trained model is a fast, lightweight, highly transferrable tool that can drastically reduce the computational time used for subsequent studies of the system, both for the designers and for other research groups. Given a set of geometric parameters, the forward design path of the model predicts CD spectra with values virtually identical to the simulations we used as ground-truth. And, vice versa, for input CD spectra, the model retrieves the set of geometric parameters that would produce such input CD spectra by solving the inverse problem. As a result, the trained model can be used to explore the entire design space and thus render a complete account of the intricate relationship between the metamaterial’s geometric parameters and its chiroptical response. This is made possible by the joint-learning feature incorporated in the model. For nanophotonic applications, the design and prototyping process needs to be robust due to the complexity of light–matter interaction with nontrivial geometries. The multitasking DL-based prediction model presented herein can aid in engineering any potential fabrication of the nanophotonic structures for desired optical and chiroptical response toward a variety of applications. Illustrating a potential context for the utilization of this system, we have shown additional chiroptical properties of this chiral metamaterial absorber structure in the context of its interaction with molecular enantiomers. The high efficiency and accuracy of the end-to-end MDL model makes it a valuable tool for the study of complex physical phenomena, particularly for the design and prototyping of nanophotonic structures toward their application as biosensors, as photodetectors, or in polarization-resolved imaging and CD spectroscopy, among others.
[1] S. Zu, Y. Bao, Z. Fang. Planar plasmonic chiral nanostructures. Nanoscale, 2016, 8: 3900-3905.
[7] J. R. Mejía-Salazar, O. N. Oliveira. Plasmonic biosensing. Chem. Rev., 2018, 118: 10617-10625.
[14]
[36] A. Maurer, M. Pontil, B. Romera-Paredes. The benefit of multitask representation learning. J. Mach. Learn. Res., 2016, 17: 2853-2884.
[37]
[39]
[57]
[62]
Article Outline
Eric Ashalley, Kingsley Acheampong, Lucas V. Besteiro, Peng Yu, Arup Neogi, Alexander O. Govorov, Zhiming M. Wang. Multitask deep-learning-based design of chiral plasmonic metamaterials[J]. Photonics Research, 2020, 8(7): 07001213.