Skip to content Skip to footer

BLOOM- A new member in the NLP space

On the 12th of July 2022, the world of artificial intelligence and data science (specifically NLP) got exciting news in the Large Language Models(LLMs) field. The BigScience, an open collaboration of Hugging Face, GENCI and IDRIS and one of the most extensive research workshops in the field of NLP, has introduced complete transparency and open sourced multilingual large language model BLOOM clipped form of BigScience Large Open-science Open-access Multilingual Language Model. Let’s talk a bit more in detail using the following pointers.

Table of content

  • What is BLOOM?
  • Who can use BLOOM?
  • Technical specifications

What is BLOOM?

Bloom is one of those autoregressive large language models capable of generating text from a prompt on a massive amount of text data. This model is not limited to one language; it can produce such text in 46 languages and 13 programming languages. Furthermore, this model can also be extended to perform such NLP or text tasks it has been explicitly trained for, just by casting the process as a text generation task.

It is the first complete transparent multilingual LLM that took 176 billion parameters and the Jean Zay supercomputer for training. Building this model required the engagement of 1000 researchers from 70+ countries and more than 250 institutions for 117 days of training.

Who can use BLOOM?

Any one individual or organisation wanting to try and investigate this model can download, run and study from here. Before utilising it, we require to agree to these terms and conditions.

Since it is embedded in the Hugging Face platform, its implementations are the same as the other transformers of Hugging Face. This means it can be imported with transformers and run with accelerates.

This model can be utilised in some real-world use cases that require text generation like writing recipes, information extraction from articles, or composing new sentences utilising the series of texts. Also, it is an excellent example for many aspirant data science developers and researchers from where they can start their journey of learning software like PyTorch, apex, DeepSpeed etc., in a deeper direction.

Technical specification

BLOOM is a modified version of Megatron-LM GPT2 that includes only decoder architecture. Talking about the parameters and layer space, it contains 176 billion parameters, 70 layers and 112 attention heads with 14336-dimensional hidden layers. The objective function enabled by the model is cross entropy with mean reduction.

As discussed above, this model got trained on Jean Zay Public Supercomputer, which was provided by the French government. Some of the specifications of this computer are as follows:

  • 384 A100 80GB GPUs (48 nodes)
  • CPU: AMD
  • CPU memory: 512GB per node
  • GPU memory: 640GB per node
  • Inter-node connect: Omni-Path Architecture (OPA)

Training of this model took 1.6TB of text(pre-processed), and the below pie chart represents the text distribution according to language in training data.

The model’s training started on the 11th of march, and it is estimated that the end date of the training is 5th July 2022. The number of Epochs utilised in training is one, and the estimated cost of the training is Equivalent to $2–5M in cloud computing.

There are the following versions of BLOOM is available:

  • bloom-350m
  • bloom-760m
  • bloom-1b3
  • bloom-2b5
  • bloom-6b3
  • bloom (175B parameters)

Final words

We have seen what it takes to build this model and how it can help the Data Science community. By looking at the various things, we can say that it has the potential to become revolutionary in the field of NLP as the development team is considering it as just a beginning and not just a one-and-done model. You can try the BLOOM model here.

About DSW

Data Science Wizards (DSW) is an Artificial Intelligence and Data Science start-up that primarily offers platforms, solutions, and services for making use of data as a strategy through AI and data analytics solutions and consulting services to help enterprises in data-driven decisions.

DSW’s flagship platform UnifyAI is an end-to-end AI-enabled platform for enterprise customers to build, deploy, manage, and publish their AI models. UnifyAI helps you to build your business use case by leveraging AI capabilities and improving analytics outcomes.

Connect us at contact@datasciencewizards.ai and visit us at www.datasciencewizards.ai