Prometheus: A Recursively Self-Improving NAS System
- Alex Zhang
- Sep 3
- 8 min read
Authors: Alex Zhekai Zhang†; Hui Liu
Affiliations: Cate School, Carpinteria, CA, USA; Missouri State University, Springfield, MO, USA
Corresponding author: Alex_Zhang@cate.org (†)
Abstract
Neural Architecture Search (NAS) automates model design, but for systems involving a Reinforcement Learning (RL) controller, one limitation is the fixed intelligence of the controller itself. We introduce Prometheus, a proof-of-concept NAS system that addresses this barrier through recursive self-improvement. Prometheus utilizes an RL agent that not only edits a target convolutional neural network using network morphism but also modifies its own architecture. This self-editing allows it to increase its intellectual capacity to achieve better rewards. We demonstrate that this approach, combining a self-editing GNN controller with heuristic-driven adaptation, achieves competitive performance on standard image classification benchmarks like CIFAR-10 (95.47%±0.60%), SVHN (97.09%±0.15%), and Fashion-MNIST (95.57%±0.20%), opening a new avenue of research in self-improving AI.
Introduction
Neural Architecture Search (NAS) automates network design, evolving from computation‑heavy early methods (Zoph and Le, 2017) to more efficient approaches such as weight sharing (Pham et al., 2018), differentiable search (Liu, Simonyan and Yang, 2019), and network morphism that enables function‑preserving edits during training (Chen, Goodfellow and Shlens, 2016). More recent advances include extensible zero-cost proxies like Eproxy (Li et al., 2023), robust training-free NAS techniques (He et al., 2024), and hardware-aware multi-objective differentiable NAS (Sukthanker et al., 2025). However, for NAS with RL, most systems rely on fixed‑capacity, human‑designed RL controllers. We ask: can a NAS agent improve both the target model and itself? We introduce Prometheus, a recursively self‑improving NAS system whose Graph Neural Network (GNN)‑based RL controller edits its own architecture as well as the target network. Unlike EAS (Cai et al., 2018), which applies morphism only to the target, Prometheus applies it to the agent too, using a block‑based search space, graph representations, and heuristic‑triggered self‑modification. Across standard image classification benchmarks, Prometheus achieves competitive performance (95.47%±0.60% on CIFAR-10, 97.09%±0.15% on SVHN, and 95.57%±0.20% on Fashion-MNIST) while adding a new level of autonomy to NAS.
Methods
Prometheus is a two-network system, with an RL controller network editing a target network (the network being trained on the target task; image recognition in this case). The RL controller is a GNN, specifically a Graph Convolutional Network (GCN) (Kipf and Welling, 2017). In this formulation, the target CNN is represented as a graph G_t = (V, E), where each node v_i ∈ V represents an operation (e.g., Conv2D (LeCun et al., 1998), ReLU (Nair and Hinton, 2010)) with a feature vector encoding its properties, and edges E represent the data flow. On every forward pass, the GCN performs message passing, allowing each node to aggregate information from its neighbors. The resulting network graph G_t feeds into policy heads that issue structurally informed actions. The GCN encoder produces node embeddings Z from the graph Z = GCN(G_t).
The resulting matrix Z belongs to R^{|V| × d_h}, where R is the set of all real numbers, |V| is the number of nodes in the target network graph, and d_h is the dimensionality of those nodes. A global graph embedding z_g is then obtained by mean-pooling the node embeddings.
At the architectural level, every permissible edit is a block transformation. To avoid performance drops and retraining from scratch, these edits leverage network morphism techniques (Chen, Goodfellow and Shlens, 2016), allowing the agent to modify the target network’s architecture without completely resetting its learned weights. The specific function-preserving transformations implemented are:
Net2Wider (widen): Following the operation from Chen et al. (2016), this action increases a layer's width (number of output channels). The new weight tensor is intelligently populated from the old one, and the subsequent layer is adjusted to keep the network's output invariant.
Net2Deeper (deepen): This operation increases network depth by inserting a new layer initialized to perform an identity mapping.
Custom Thinning (thin): As the inverse of widening, this custom structured pruning operation reduces a layer's width by discarding filters. It is a "lossy" transformation that reduces complexity and relies on fine-tuning to recover performance.
The controller can choose from the following edits:
Add Convolutional Block: Appends a Conv2d → BatchNorm2d → ReLU trio at a stage’s end. The Conv2d is identity-initialized (Net2Deeper style), and BatchNorm2d starts with γ = 1.0, β = 0.0. After the block is appended, the agent chooses a channel multiplier to set the block’s width, which is applied using Net2Wider.
Add Linear Block: Deepens the classifier by inserting a Linear → BatchNorm1d → ReLU right before the final layer, initialized as an identity mapping. The agent then chooses a width that is applied using Net2Wider.
Resize Layer: Selects any Conv2d or Linear node and scales its output dimension by a chosen factor using Net2Wider. A learned attention head pinpoints the most promising layer.
Add Skip Connection: Creates a shortcut between two nodes within the same stage. If channel counts differ, an identity-initialized 1 × 1 convolution is automatically inserted.
The search process is initialized with a simple VGG-style CNN backbone (Simonyan and Zisserman, 2014). This starting architecture consists of three sequential stages, each featuring a Conv2d → BatchNorm2d → ReLU block, with channel dimensions increasing from 64 to 128 and finally to 256. The first two stages are followed by max-pooling. The network is connected to a classifier head composed of an adaptive average pooling layer and a single linear layer. The initial network was pre-trained for 50 epochs to ensure that the initial rewards for the RL agent are representative of the edit quality.
The RL component was optimized with Advantage Actor-Critic (A2C). The controller's operation is formalized as a Markov Decision Process (MDP). At each timestep t, the controller receives a state s_t representing the target network's graph structure and performance metrics. It then samples an action a_t from its policy π_θ(a_t|s_t). The reward R_t now combines accuracy with a quadratic penalty for exceeding a parameter budget, explicitly steering the search toward compact models. The reward function is:

where acc_{t+1} is the post-edit validation accuracy, P_{t+1} is the new parameter count, P_thresh is a budget (20M parameters), and λ_p is a penalty coefficient (0.2). If the parameter count exceeds 30 million, the model is automatically reverted, and the controller receives a −50 reward.
The controller's parameters θ and the value function's parameters φ are updated by minimizing a composite loss function L(θ, φ), composed of a policy loss, a value loss, and an entropy bonus:

where A_t = R_t − V_φ(s_t) is the advantage, V_φ(s_t) is the critic's value estimate, H is the policy entropy, and β_v, β_e are loss coefficients. An entropy bonus (−0.0005 · entropy) was applied to the final loss function to encourage exploration.
The number of post-edit training epochs, E_post, scales in proportion to the magnitude of the architectural change:

where E_base = 25, and P_t, P_{t+1} are the parameter counts before and after the edit.
The RL agent was given the ability to prune itself in addition to growing itself, but only after certain heuristic triggers. If validation accuracy fails to improve for 5 iterations, a growth self-edit is triggered. If the model fails 3 consecutive dummy forward passes (a check for immediate NaN errors), a pruning self-edit occurs. Growth choices comprise deepening the GNN, widening its hidden layers, or deepening a policy head, all with function preserving operations. Pruning options, which are the reverse, were enabled only after the controller's parameter count exceeded 15,000.
Finally, the meta-agent’s own learning rate is adaptive, annealing based on the target model’s accuracy to balance exploration and exploitation:

where acc_base = 0.80 and acc_target = 0.93 define the accuracy range for annealing.
Experiments and Results
We evaluated Prometheus on CIFAR-10, SVHN, and Fashion-MNIST over ten independent runs for each experiment. The search was performed on a single NVIDIA L4 GPU.
Analysis of Self-Editing Mechanism
The ablation shows a 0.37% average gain when the controller’s self-editing capability is enabled. To determine if this 0.37% improvement was statistically significant, we performed a paired two-sample t-test on the results from each seed. The analysis confirms that the performance gain from self-editing is statistically significant (p = 0.012) at an α = 0.05 level. This provides strong evidence that allowing the controller to adapt its own architecture during the search leads to the discovery of superior final network architectures.
Table 1. Ablation study of the self-editing mechanism on CIFAR-10. Results are mean ± standard deviation over ten runs.
System Variant | Self-Editing | Peak Acc. (%) |
Prometheus (Ablated) | Disabled | 95.10±0.55 |
Prometheus (Full) | Enabled | 95.47±0.60 |
Benchmark Performance
Table 2. CIFAR-10 accuracy vs. other controller-based NAS methods.
Method | Top-1 Acc. (%) |
NAS-RL (Zoph and Le, 2017) | 96.35 |
PNAS (Liu et al., 2018) | 96.60 |
ENAS (Pham et al., 2018) | 97.11 |
EAS (CNN only) (Cai et al., 2018) | 95.77 |
Prometheus (average) | 95.47±0.60 |
Prometheus (best of 10 runs) | 96.58 |
On CIFAR-10 (Table 2), Prometheus is competitive with its closest antecedent, EAS, while introducing the novel self-editing dynamic. It operates with a far smaller computational budget than methods like NAS-RL or PNAS. The search time took an average of 23 GPU hours.
Table 3. Test accuracy comparison on SVHN.
Method | Accuracy (%) |
DrNAS (Chen et al., 2021) (reported by Lee et al., 2021) | 96.30±0.05 |
EAS (Cai et al., 2018) | 98.17 |
ResNet (baseline from Lee et al.; ResNet‑56) (Lee et al., 2021) | 96.13±0.19 |
Prometheus | 97.09±0.15 |
On SVHN (Table 3), Prometheus outperforms all compared baselines except for EAS, demonstrating strong generalization without any dataset-specific tuning. This search took an average of 49 GPU hours.
Table 4. Test accuracy comparison on Fashion-MNIST.
Method | Accuracy (%) |
MO-ResNet (Wang et al., 2025) | 95.91 |
DeepSwarm (Byla and Pang, 2019) | 93.56 |
Hierarchical NAS (Christoforidis et al., 2023) | 93.25 |
Prometheus | 95.57±0.20 |
On Fashion-MNIST (Table 4), it achieves 95.57%±0.20%, outperforming several evolutionary methods and remaining highly competitive with the state-of-the-art. The average search time was 20 GPU hours.
Conclusion
We introduced Prometheus, a NAS system built on the principle of recursive self-improvement. The system, which combines a self-editing GNN controller with block-based actions and heuristic-driven adaptation, achieves competitive accuracy on multiple benchmarks. The primary contribution of this work, however, is the mechanism. We present Prometheus as a successful proof-of-concept for a more autonomous class of NAS agents that can manage their own complexity. A key limitation is the reliance on hard-coded heuristics to trigger self-modification. Future work should aim to integrate this decision directly into the agent's learning process, for example, through a hierarchical RL policy. Solving this credit assignment problem is a key step toward building more general and truly autonomous machine learning systems.
References
Byla, E., & Pang, W. (2019). Deepswarm: Optimising convolutional neural networks using swarm intelligence. Proceedings of the 19th UK Workshop on Computational Intelligence (UKCI).
Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018). Efficient architecture search by network transformation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2787–2794.
Chen, T., Goodfellow, I., & Shlens, J. (2016). Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641.
Christoforidis, A., Kyriakides, G., & Margaritis, K. (2023). A novel evolutionary algorithm for hierarchical neural architecture search. arXiv preprint arXiv:2107.08484.
Chen, X., Wang, R., Cheng, M., Tang, X., & Hsieh, C.-J. (2021). DrNAS: Dirichlet Neural Architecture Search. International Conference on Learning Representations (ICLR).
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR).
Lee, H., Hyung, E., & Hwang, S. J. (2021). Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets. arXiv preprint arXiv:2107.00860.
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. Proceedings of the European Conference on Computer Vision (ECCV), 19–34.
Liu, H., Simonyan, K., & Yang, Y. (2019). DARTS: Differentiable architecture search. International Conference on Learning Representations (ICLR).
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. Proceedings of the International Conference on Machine Learning (ICML), 4095–4104.
Wang, S., Tang, H., & Ouyang, J. (2025). A neural architecture search method using auxiliary evaluation metrics based on resnet architecture. arXiv preprint arXiv:2505.01313.
Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.
He, Z., Shu, Y., Dai, Z., & Low, B. K. H. (2024). Robustifying and Boosting Training‑Free Neural Architecture Search. ICLR Posters.
Sukthanker, R. S., Zela, A., Staffler, B., Dooley, S., Grabocka, J., & Hutter, F. (2025). Multi‑Objective Differentiable Neural Architecture Search. ICLR Posters.
White, C., Safari, M., Sukthanker, R., Ru, B., Elsken, T., Zela, A., Dey, D., & Hutter, F. (2023). Neural Architecture Search: Insights from 1000 Papers. arXiv preprint arXiv:2301.08727.
Li, Y., Li, J., Hao, C., Li, P., Xiong, J., & Chen, D. (2023). Extensible and Efficient Proxy for Neural Architecture Search. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6199–6210.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Nair, V., & Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on Machine Learning (ICML), 807–814.
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.
Really interesting read! Prometheus seems like a big step forward in automating and accelerating neural architecture search. The idea of a recursively self-improving NAS system almost makes it feel like the model is “designing the designer,” which could dramatically reduce the human effort usually needed in architecture engineering. I’d love to see more details on how it balances exploration vs. exploitation—does the recursive loop risk converging too quickly on suboptimal designs, or is there a mechanism to keep innovation going?