How will 
artificial intelligence
change our future?
AI Insights
What are foundation models in generative AI?
Quick links
image features an art deco aesthetic with a minimalist design, representing a foundational AI model. At the ground level, geometric, abstract shapes symbolize the foundation and building blocks of generative AI. Elements representing data flow, such as lines and shapes, move through and emerge from the foundation, creating a dynamic visual. The clean lines and shapes emphasize the concept of a strong, stable base with active data elements, blending technology and futurism in a stylized manner.

What are foundation models in generative AI?

By Jacob Andra / Published August 7, 2024 
Last Updated: August 8, 2024

Foundation models are large-scale machine learning models that form the core technology for generative AI applications.

You could view foundation models as versatile raw materials, like high-quality steel, and AI applications as the specific tools crafted from this steel.

The steel possesses inherent properties such as strength, durability, and flexibility. It's not usable on its own for most practical purposes, but it forms the basis for a wide array of tools.

  • A surgical scalpel (medical diagnosis AI)
  • A construction beam (structural analysis AI)
  • A precision gear in a watch (financial modeling AI)

Each tool serves a distinct purpose, but they all benefit from the fundamental properties of the steel. Similarly, AI applications leverage the core capabilities of the foundation model but are shaped and optimized for specific tasks.

Main takeaways
Foundation models underpin advanced AI applications.
Leading foundation models include GPT-4, LLaMA 2, and Gemini.
Most enterprise AI users are more interested in use cases than foundation models.
It’s good to have a basic understanding of how AI works.
WORK WITH TALBOT WEST

Characteristics of foundation models

a depiction of how foundation models work from Stanford University

Source: “On the Opportunities and Risks of Foundation Models,” Center for Research on Foundation Models (CRFM) Stanford Institute for Human-Centered Artificial Intelligence (HAI).

Foundation models have billions of parameters, which are essentially the 'neurons' of the model that store learned patterns. Foundation models are trained on massive datasets, enabling them to recognize complex patterns.

According to a report from Stanford University, “the scale of data used to train these models is enormous. For instance, GPT-3 was trained on hundreds of billions of words.”

They are versatile; a single foundation model can be fine-tuned for a wide range of tasks, from language translation to image generation. These models also exhibit transfer learning capabilities, meaning knowledge gained from one task can be applied to another, enhancing efficiency and performance.

Here are some of the leading foundation models on the market:

ToolDeveloped byDescriptionParameters

GPT-4

OpenAI

Advanced language model for text generation and complex tasks

Estimated to be in the range of 100-170 billion

LLaMA 2

Meta

Open-source model optimized for research and academic use, strong text generation

Up to 70 billion

Turing-NLG

Miscrosoft

High-quality text generation, strong performance in various NLP tasks

17 billion

Mistral 7B

Mistral AI

Efficient and lightweight model designed for a wide range of NLP tasks, high performance despite smaller size

7 billion

Claude 2

Anthropic

Focus on safety and alignment, designed to be more interpretable and controllable

Not publicly disclosed

Gemini

Google

Multimodal AI model capable of processing and generating text, images, audio, and video

Not publicly disclosed

Command R

Cohere

Optimized for retrieval-augmented generation (RAG), improved performance in tasks requiring external knowledge retrieval

Not publicly disclosed

StableLM

Stability AI

Open-source, designed for stability and reliability, strong performance in text generation and understanding

Not publicly disclosed

Parameters in foundation models

Art deco aesthetic, minimalist abstract representation of a human brain combined with a computer chip, blending organic and digital elements. Neural pathways merge with chip circuits, symbolizing foundation models in generative AI and synergy between human intelligence and AI. Landscape orientation--- Parameters in foundation models by Talbot West

Parameters are the building blocks of foundation models, functioning like the model's "brain cells." They store the patterns and relationships the model learns from its training data. More parameters allow a model to capture more complexity and store broader knowledge. This increased capacity translates to better performance, but more parameters also equate to higher computational demands and increased resource usage.

Practical applications of foundation models

Foundation models power a wide variety of applications across various industries.

Here are some practical examples:

  • Natural language processing (NLP): models like GPT-4 are used in chatbots, virtual assistants, and language translation services, improving communication and accessibility.
  • Content creation: tools like DALL-E 2 generate images from textual descriptions, while GPT models write articles, create marketing copy, and assist in creative writing.
  • Healthcare: AI models help in diagnosing diseases by analyzing medical images, suggesting treatment plans, and even predicting patient outcomes.
  • Finance: AI assists in fraud detection, risk management, and personalized financial advice by analyzing vast datasets and identifying patterns.
  • Retail: models enhance customer experiences through personalized recommendations, inventory management, and dynamic pricing strategies.
Art deco aesthetic, minimalist scene of a large, glowing gear at the center surrounded by smaller gears representing different practical applications of foundation models. Icons on smaller gears symbolize healthcare, finance, education, and entertainment. Interconnected gears illustrate AI integration in various fields---Practical applications of foundation models by Talbot West

Looking to use these tools in your business?

Talbot West can help you harness the potential of AI to drive innovation and efficiency. Contact us today for a free consultation and explore how we can tailor AI solutions to meet your specific needs.

Work with Talbot West

Future trends in foundation models

The field of generative AI and foundation models is rapidly evolving.

Some future trends include:

  • Improved efficiency: these models will become more efficient at the same time they grow larger and more intelligent.
  • Enhanced interpretability: efforts are being made to improve the transparency of these models, making it easier to understand their decision-making processes.
  • Broader applications: as these models become more advanced, their applications will expand into new domains, including autonomous systems and advanced robotics.
  • Ethical AI development: increasing focus on developing ethical AI practices to mitigate biases, enhance fairness, and ensure responsible usage.

Foundation models FAQ

GPT-4 is a foundation model. Foundation models are large-scale machine learning models that are pre-trained on vast amounts of diverse data, enabling them to perform a wide range of tasks. GPT-4, developed by OpenAI, fits this description perfectly. It has been trained on a comprehensive dataset and possesses a vast number of parameters, allowing it to generate human-like text and understand complex language patterns.

This extensive pre-training enables GPT-4 to be fine-tuned for various specific applications, such as chatbots, content creation, language translation, and more. As a foundation model, GPT-4 serves as a versatile and powerful tool in the realm of generative AI.

There are four basic concepts that form the foundation of artificial intelligence. These concepts help in understanding how AI systems are designed, implemented, and function.

  1. Machine learning (ML): machine learning is a subset of AI that involves training algorithms to recognize patterns in data and make predictions or decisions without being explicitly programmed. It enables systems to learn and improve from experience, similar to how humans learn from their experiences. Examples include spam detection in email, recommendation systems in streaming services, and image recognition in social media.
  2. NLP: NLP is a field of AI focused on the interaction between computers and humans through natural language. It enables machines to understand, interpret, and respond to human language in a meaningful way. Examples include chatbots, language translation services, and voice-activated assistants like Siri and Alexa.
  3. Computer vision: computer vision is the field of AI that trains computers to interpret and make decisions based on visual data. It allows machines to gain high-level understanding from digital images or videos, enabling them to identify and classify objects accurately. Examples include facial recognition systems, autonomous vehicles, and medical imaging diagnostics.
  4. Robotics: robotics is a branch of AI that deals with the design, construction, operation, and use of robots. It involves creating intelligent machines that can assist humans in various tasks, often in environments that may be hazardous or difficult for humans to navigate. Examples include industrial robots on assembly lines, drones for delivery and surveillance, and robotic vacuum cleaners.

The core foundation for artificial intelligence lies in its ability to simulate human-centered artificial intelligence and perform tasks that typically require human cognition.
This foundation is built upon these components:
1. Algorithms and models
2. Data
3. Computing power
4. Machine learning
5. Neural networks
6. Natural language processing
7. Computer vision

Creating a foundation model involves a broad range of steps, from data collection to model deployment.

Here’s a high-level overview of the process:

  1. Define objectives and scope: clearly define what the foundation model will be used for (e.g., language understanding, image generation). Decide on the scope and limitations of the model, including the types of data it will handle and the specific tasks it will perform.
  2. Data collection and preparation: collect large and diverse datasets relevant to the model's objectives. This could include text, images, audio, and video data. Clean and preprocess the data to ensure it is free from errors, inconsistencies, and biases. Enhance the dataset through techniques like augmentation to improve model robustness.
  3. Model design: select an appropriate model architecture based on the task. Common architectures include transformers for text (e.g., GPT, BERT) and convolutional neural networks (CNNs) for images. Set the initial parameters, including the number of layers, neurons per layer, activation functions, and learning rate.
  4. Training the model: train the model on the collected data using high-performance computing resources. This phase can take a significant amount of time and computational power. Adjust the model on specific, smaller datasets to improve performance on particular tasks. Optimize the model's hyperparameters (e.g., learning rate, batch size) to enhance performance and efficiency.
  5. Validation and testing: use a separate validation dataset to periodically test the model during training to ensure it is learning correctly. Evaluate the model on a dedicated test set to assess its performance and generalization capabilities.
  6. Deployment and monitoring: deploy the trained model into a production environment where it can be used for real-world applications. Continuously monitor the model's performance and make adjustments as needed. This includes checking for data drift, retraining the model periodically, and ensuring it remains accurate and reliable.
  7. Ethical considerations: implement techniques to identify and mitigate biases in the training data and model outputs. Ensure the model's decisions can be understood and explained to end-users and stakeholders. Adhere to relevant regulations and ethical guidelines regarding data privacy and AI usage.

ChatGPT uses a type of artificial intelligence model known as a transformer, specifically the GPT architecture.

You can generate code using generative AI. Multiple AI models and tools are specifically designed for this purpose, leveraging the capabilities of generative AI to assist with coding tasks.

Foundation models are the result of upstream tasks and serve as the basis for downstream tasks in AI development:

  1. Upstream work creates versatile, powerful foundation models.
  2. Downstream applications build upon these models, saving time and resources compared to developing specialized AI from scratch.
  3. This approach allows rapid deployment of AI solutions across industries, as the core capabilities are already in place and only need adaptation for specific use cases.

Generative AI uses algorithms to create new content based on existing data. This process is driven by advanced machine learning techniques, primarily deep learning and neural networks.

Neural networks

Neural network architecture mimics the human brain's structure and function. These networks consist of layers of nodes, or neurons, that process data and learn patterns.

When trained on large datasets, neural networks can generate new content by predicting and assembling elements based on learned patterns.

  1. Train the model. The first step in generative AI is training the model using a huge amount of data. For example, a generative AI model designed to create text (such as GPT-3) is trained on diverse text datasets, including books, articles, and websites. The model learns grammar, context, and nuances of language during this phase.
  2. Pattern recognition. As the model processes data, it recognizes patterns and relationships within the input data. For text generation, this might include understanding sentence structure, word associations, and contextual relevance. For image generation, it could involve recognizing shapes, colors, and textures.
  3. Generate new content. Once trained, the model uses its learned patterns to generate new content. For text, it can write essays, stories, or articles by predicting what comes next based on the initial input. For images, it can create new visuals by blending learned elements in novel ways.

Deep learning

Deep learning, a subset of machine learning, drives the capabilities of neural networks through multiple layers of processing. These layers allow the model to learn complex patterns and representations from large data sets.

  • Layered learning. Deep learning models consist of many layers, each extracting higher-level features from the raw input. For example, in image generation, initial layers might detect edges and simple shapes, while deeper layers identify complex structures such as objects and scenes.
  • Backpropagation. This algorithmic approach trains deep learning models. It involves adjusting the weights of connections between neurons based on the error rate of the output and expected result.
  • High computational power. Deep learning requires substantial computational resources. Specialized hardware such as graphics processing units and tensor processing units handle the intense computations involved in training deep learning models.

Most generative models fall into the following three categories:

  • Large language models (LLMs) predict the next word in a sequence based on extensive training data. LLMs are the best for creating textual content and they power many chatbots, writing assistants, and text completion tools.
  • Generative adversarial networks (GANs) employ two competing neural networks to produce new content. One network generates content, while the other evaluates it. This back-and-forth process results in increasingly realistic outputs. They work well for creating visual and audio content, such as artificial images or synthetic voices.
  • Variational autoencoders (VAEs) compress input into a coded form and then decode it to create a new output. VAEs often produce visual content or code and excel at tasks such as image generation and style transfer.

Resources

  1. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., ... Liang, P. (2021). On the Opportunities and Risks of Foundation Models. Center for Research on Foundation Models (CRFM), Stanford Institute for Human-Centered Artificial Intelligence (HAI), Stanford University. Retrieved from https://crfm.stanford.edu/assets/report.pdf
  2. Hoiem, D. (2023). Foundation models: CLIP and GPT [Lecture notes]. Grainger College of Engineering, University of Illinois. Retrieved from https://courses.grainger.illinois.edu/cs441/sp2023/lectures/Lecture%2015%20-%20Foundation%20Models%20-%20CLIP%20and%20GPT.pdf
  3. IBM. (2024). A differentiated approach to AI foundation models: Scale generative AI with enterprise-grade models. Retrieved from https://www.ibm.com/downloads/cas/9RXLQYM0
  4. Yuan, Y. (2023). On the power of foundation models. Proceedings of the 40th International Conference on Machine Learning. Retrieved from https://proceedings.mlr.press/v202/yuan23b/yuan23b.pdf
  5. Schneider, J., Meske, C., & Kuss, P. (2024). Foundation Models: A New Paradigm for Artificial Intelligence. Business & Information Systems Engineering, 66(2), 221-231. https://doi.org/10.1007/s12599-024-00851-0

About the author

Jacob Andra is the founder of Talbot West and a co-founder of The Institute for Cognitive Hive AI, a not-for-profit organization dedicated to promoting Cognitive Hive AI (CHAI) as a superior architecture to monolithic AI models. Jacob serves on the board of 47G, a Utah-based public-private aerospace and defense consortium. He spends his time pushing the limits of what AI can accomplish, especially in high-stakes use cases. Jacob also writes and publishes extensively on the intersection of AI, enterprise, economics, and policy, covering topics such as explainability, responsible AI, gray zone warfare, and more.
Jacob Andra

Industry insights

We stay up to speed in the world of AI so you don’t have to.
View All

Subscribe to our newsletter

Cutting-edge insights from in-the-trenches AI practicioners
Subscription Form

About us

Talbot West bridges the gap between AI developers and the average executive who's swamped by the rapidity of change. You don't need to be up to speed with RAG, know how to write an AI corporate governance framework, or be able to explain transformer architecture. That's what Talbot West is for. 

magnifiercrosschevron-downchevron-leftchevron-rightarrow-right linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram