Facebook Open-Sources GHN-2 AI for Fast Initialization of Deep-Learning Models – InfoQ.com

0 Comments

Stay ahead of the tech that matters: Attend in-person QCon London (April 4-6, 2022), or online QCon Plus (May 10-20, 2022). Register Now
Facilitating the spread of knowledge and innovation in professional software development


The Continuous Documentation methodology is a useful paradigm that helps ensure that high-quality documentation is created, maintained, and readily available. Code Walkthroughs take the reader on a “walk” — visiting at least two stations in the code — describe flows and interactions, and often incorporate code snippets.
Charles Humble discusses the design of the Ballerina programming language with its lead designer James Clark. They discuss how the goals of the language inform a number of design choices including: the type system, error handling, the concurrency model, and the language’s built in support for visualization of program flows.
In the podcast, Meenakshi Kaushik and Neelima Mukiri from the Cisco team speak on responsible AI and machine learning bias and how to address the biases when using ML in our applications.
Redgate Software runs a yearly deliberate reteaming process across engineering to alter how they invest the efforts of teams and encourage people to move towards the work they find most engaging. Self-selection reteaming is an effective and empowering method of aligning with company goals. It normalized the idea of people moving between teams for personal development and renewed sense of purpose.
Tammy Bryant Butow covers practical lessons learned in the SRE Apprentices program, things she’d change and shares how to create and roll out such a program.
Learn from practitioners driving innovation and change in software. Attend in-person on April 4-6, 2022.
Your monthly guide to all the topics, technologies and techniques that every professional needs to know about. Subscribe for free.
Uncover emerging trends and practices from software leaders. Attend online on May 10-20, 2022.
InfoQ Homepage News Facebook Open-Sources GHN-2 AI for Fast Initialization of Deep-Learning Models
Nov 30, 2021 3 min read
by
Anthony Alford
A team from Facebook AI Research (FAIR) and the University of Guelph have open-sourced an improved Graph HyperNetworks (GHN-2) meta-model that predicts initial parameters for deep-learning neural networks. GHN-2 executes in less than a second on a CPU and predicts values for computer vision (CV) networks that achieve up to 77% top-1 accuracy on CIFAR-10 with no additional training.
The researchers described the system and a series of experiments in a paper accepted for the upcoming Conference on Neural Information Processing Systems (NeurIPS). To solve the problem of predicting initial parameters for deep-learning models, the team generated a dataset called DeepNets-1M that contains one million examples of neural network architectures represented as computational graphs. They then used meta-learning to train a modified graph hyper-network (GHN) on this dataset, which can then be used to predict parameters for an unseen network architecture. The resulting meta-model is "surprisingly good" at the task, even for architectures much larger than the ones used in training. When used to initialize a 24M-parameter ResNet-50, the meta-model found parameters that achieved 60% accuracy on CIFAR-10 with no gradient updates. Along with their trained meta-model and code, the team released the DeepNets-1M training dataset as well as several benchmark test datasets. According to lead author Boris Knyazev,
Based on our…paper, we are one step closer to replacing hand-designed optimizers with a single meta-model. Our meta-model can predict parameters for almost any neural network in just one forward pass.
Training a deep-learning model on a dataset is formalized as finding a set of model parameters that minimizes the model's loss function evaluated on the training data. This is typically done by using an iterative optimization algorithm, such as stochastic gradient descent (SGD) or Adam. The drawback to this method is that the minimization can take many hours of computation and a good deal of energy. In practice, researchers will often train many models in order to find the best network architecture and set of hyperparameters, compounding the cost.
To help reduce the cost of training models, the Facebook team created a hyper-model that is trained for a specific dataset. Given a proposed network architecture, the hyper-model can predict performant parameters for the network. Inspired by work on a network architecture search (NAS) algorithm called Differentiable ARchiTecture Search (DARTS), the team formulated a meta-learning task. This task requires a domain-specific dataset, such as ImageNet, as well as a training set of model network architectures expressed as computational graphs. The team then trained a hyper-model using graph-learning techniques; the hyper-model's objective is to predict parameters for the input network architectures that minimize the networks' loss on the domain-specific data.
GHN-2 Overview
Source: https://github.com/facebookresearch/ppuda
To assess the performance of their technique, the team trained meta-models for two domain-specific datasets: ImageNet and CIFAR-10. They compared the performance of parameters generated by GHN-2 to those generated by two other baseline meta-models as well as to model parameters produced by standard iterative optimizers. The parameters were predicted for a set of network architectures not used for training the meta-models. GHN-2 "significantly" outperformed the baseline meta-models. Compared to iterative optimizers, the parameters predicted by GHN-2 with only a single forward pass achieved "an accuracy similar to ∼2500 and ∼5000 iterations of SGD on CIFAR-10 and ImageNet respectively."
The GHN-2 model does have some drawbacks. First, a new meta-model must be trained for each domain-specific dataset. Also, although GHN-2 can predict parameters that outperform random choices, Knyazev notes that "depending on the architecture," the predictions may not be very accurate. In a Reddit discussion about the paper, one user noted
At the very least, as the author’s tweet thread points out, the predicted parameters are likely a lot better than a random distribution for weight initialization…assuming it generalizes a bit within some class of learning network architectures that is a very interesting and potentially useful development.
The trained GHN-2 model and code as well as the DeepNets-1M dataset are available on GitHub.

Watch the GraphQL conference for developers and leaders, on-demand.
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.
You need to Register an InfoQ account or or login to post comments. But there’s so much more behind being registered.
Get the most out of the InfoQ experience.
Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.
QCon, the international software development conference, is returning (in-person and online) in Spring 2022.
QCon brings together the world’s most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices.
Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership to help you make the right decisions.
Attend in person at QCon London, (April 4-6) or online at QCon Plus (May 10-20). Save your spot now!
InfoQ.com and all content copyright © 2006-2021 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we’ve ever worked with.
Privacy Notice, Terms And Conditions, Cookie Policy

source

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Related Posts