Generative AI

Generative AI is a powerful subset of artificial intelligence and could produce new data, text, images, and videos with impressive accuracy.
In order to imitate human creativity in the media it generates, it uses several models including Generative Adversarial Networks (GAN) and Variational Auto Encoders (VAE).

Lower Entry Barrier

Generative AI can work effectively with smaller amounts of data or examples, making it accessible to organizations that may not have large datasets readily available. Similarly, APIs are available to streamline the integration process. These reduce the barriers to entry and allow organizations to start leveraging AI capabilities sooner.

Based on their input / output, AI models can be categorized into text-based models, video models, audio models, and more.

Text-Based Models

Large Language Models (LLMs): Models like ChatGPT and T5 are among the most advanced text-based generative models. They can generate contextually relevant text given a prompt or partial sentence. Other capabilities include summarization, translation, and question-answering.

Video Models

Variational Autoencoders for Video (VAE-Video): VAE-Video models such as Video Pixel Networks and MoCoGAN can learn representations of motion and generate realistic and diverse video content. Often they are used along with CNNs.

Audio Models

Audio Generative Adversarial Networks (Audio-GANs): These models vary in their capability to handle different types of audio such as speech, music, special effects, etc. Examples include GANSynth and HiFi-GAN.

3D Models

3D Generative Adversarial Networks (3D-GANs): 3D-GANs generate three-dimensional objects, complete 3D shapes etc. Some leading models are EG3D and AtlasNet.

Image Models

Deep Convolutional Generative Adversarial Networks (DCGANs): They are widely used for image generation and editing. Progressive GAN and Big GAN are some popular examples.

Multimodal Models

Multimodal models, such as CLIP and DALL-E, take one or more input types and generate a different output type. CLIP takes images and text to generate subtitles. DALL-E generates images based on textual descriptions.

Code Generating Models

Models like GPT-Code and Deep Coder are specifically designed for code generation. These models can generate code snippets, functions, or even entire programs based on prompts or task specifications.