Updated: Apr 19
The excitement about deep learning and foundation models
We’ve seen a lot of excitement and press around current breakthroughs in deep learning. This excitement comes from what are loosely called foundation models. Foundation models are huge neural networks that are taught without supervision using vast amounts of data harvested from the web. But it’s what these networks can do in so-called “downstream tasks” that makes them so impressive. A “downstream task” is when a network that was trained for purpose A, is then adapted to solve purpose B.
Networks such as OpenAI’s DALL-E-2 and Stability AI’s Stable Diffusion are all over social media because of their ability to generate art from textual prompts. The results are amazing, but what is generating more discussion are the issues over copyright and ethics. These networks are able to generate novel art but they have been trained on art that was created by, and belongs to artists. Did they give their permission for their data to be used in this way and is the network breaching their copyright?
Are modern Language models as good as humans ?
The latest excitement has been around OpenAI’s ChatGPT. ChatGPT is a chatbot or conversational agent that is based on the GPT-3 model. So here GPT-3 (which stands for Generative Pre-trained Transformer 3) is the foundation model that ChatGPT is built on. GPT3 is enormous, both in terms of its training data and its size. It is also remarkable in terms of the output it generates. In human judgement tests, its output was almost indistinguishable from that of a human.
But the proof is in the pudding. Have you played with it? If not, then I would create yourself an account with OpenAI and give it a go. It is indeed impressive and trying to catch it out with complex requests returns even more impressive results. It can write letters for you, answer questions and can even write code for you if asked.
So what makes these models so impressive ?
It’s the sheer size of the models (in terms of their parameters) and the amount of data they are trained on that is key to their success. For example, Stable Diffusion was trained on 2.3 billion image-text pairs. It was trained using 256 GPUs for a total of 150,000 hours at a cost of around $600,000. The largest ChatGPT model has 175 billion parameters in its model and reportedly cost over $ 4.6 million to train.
So when will we have SignGPT?
Well, it is potentially not that far away. But as with the foundation models above, what makes it work is the sheer quantities of data it is built on. So we know that data is the key to providing the same level of understanding to visual languages, such as sign language, that is currently being witnessed in written and spoken language. Signapse know that the future of AI-generated sign language requires data and for this reason, we have just appointed our Head of Translation and Data who will be starting in the new year. Of course, we want to ensure that our data policies are ethical and we have the correct access rights to data that will allow us to build the models that the Deaf community needs. We are therefore also actively seeking partnerships. Watch this space in 2023, it’s going to be an exciting time for Sign Language.