AI Digest Weekly -Week 14, Apr 03rd -Apr 09th, 2023

Summaries of top Open Sourced AI projects in domains of LLMs, Computer Vision and NLP

Aankur Bhatia
7 min readApr 10, 2023
Created using DALL-E (OpenAI)

Hi Readers,

Artificial Intelligence (AI) has become an increasingly important topic in recent years, with groundbreaking research being conducted across the globe. As the field continues to evolve at a rapid pace, keeping up with the latest developments can be a daunting task. That’s why our Weekly AI Digest has been providing readers with comprehensive summaries of the most important research papers in the field. However, starting this week, we’re excited to announce some changes to the format of our Digest that we believe will better serve our readers. Going forward, we’ll be focusing exclusively on research papers that are open sourced and have their code available on a GitHub repository. By doing so, we hope to provide more in-depth explanations of papers that are particularly relevant to developers and researchers alike. Furthermore, we’ll be providing video demonstrations of how to run the code on local machines with or without GPU support, making it easier than ever for readers to replicate the experiments described in the papers. We believe that these changes will not only improve the quality of our Digest but also better serve the needs of our readers.

For last week’s (Week 13) top AI research paper summaries, please read here

  1. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

Paper link : www.vicuna.lmsys.org

Github link : www.github.com/lm-sys/FastChat

The team at LM-SYS has released an open-source chatbot called Vicuna that is making waves for its impressive quality. Built on the LLaMa architecture, the chatbot reportedly achieves a 90% ChatGPT Quality score and outperforms the GPT-3 model. The team has released the Vicuna weights in delta form to comply with the LLaMa model license. Users can apply the delta weights to the original LLaMa weights to obtain Vicuna weights. The team has also provided methods for fine-tuning the model.

Users can install Vicuna using either pip or from the source. Inference with the command-line interface requires around 28 GB of GPU memory for Vicuna-13B and 14 GB of GPU memory for Vicuna-7B. The model can be run on a CPU only system, and a metal backend is available for Macs with Apple Silicon or AMD GPUs.

Users can also serve Vicuna using a web GUI, which requires web servers to interface with users, model workers to host one or more models, and a controller to coordinate the web server and model workers. The commands for each component are listed in the blog post.

The team is actively exploring methods to make the model more accessible to run on different platforms. Contributions and pull requests are welcome. To keep up with the latest updates, users can join the LM-SYS Discord server and follow their Twitter account.

2. GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo

Paper link : www. s3.amazonaws.com/static.nomic.ai/gpt4all/ 2023_GPT4All_Technical_Report.pdf

Github link : www.github.com/nomic-ai/gpt4all

GPT4All is an assistant-style language model trained with approximately 800k GPT-3.5-Turbo generations, based on LLaMa. This large language model is capable of generating human-like text and can be used for a variety of purposes such as natural language processing, content creation, and chatbot development. GPT4All is available to the public and can be run on a variety of hardware and operating systems.

To use GPT4All, users can download the model checkpoint from the project’s GitHub repository and run it on their computer using the appropriate command for their OS. Alternatively, users can use the new official Python bindings to interact with GPT4All in Python. There are two ways to use GPT4All on GPU, which require more setup than the CPU model. Users can clone the Nomic client repo and run pip install .[GPT4All] in the home directory, or install the additional dependencies from the wheels built in the repository.

GPT4All is compatible with several models in the GPT4All Ecosystem, including ggml-vicuna-7b-4bit, vicuna-13b-GPTQ-4bit-128g, LLaMa-Storytelling-4Bit, and more. The project’s short-term roadmap includes training a GPT4All model based on GPTJ to alleviate LLaMa distribution issues, creating improved CPU and GPU interfaces for the model, integrating LLaMa.cpp bindings, building a conversational chat interface for the model, and allowing users to opt-in and submit their chats for subsequent training runs.

In the medium term, the GPT4All team plans to integrate the model with Atlas to allow for document retrieval, integrate the model with Langchain, and build easy custom training scripts to allow users to fine-tune models. In the long term, the team aims to democratize AI and allow anyone to curate training data for subsequent GPT4All releases using Atlas. The project emphasizes reproducibility, and users can access trained LoRa weights on the GPT4All website.

3. Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data

Paper link : www.arxiv.org/pdf/2304.01196.pdf

Github link : www.github.com/project-baize/baize-chatbot

Baize

Baize is an open-source chat model trained using the LoRA framework. It has been trained on 100k self-generated dialogues using ChatGPT and also includes Alpaca’s data to enhance its performance. Baize has released 7B, 13B, and 30B models. Baize is named after a mythical creature in Chinese folklore that speaks human languages and knows everything. The Baize project aims to build a chat model with LLaMA. The repository contains 54K/57K/47K dialogs from Quora, StackOverFlow and MedQuAD questions, the code for collecting self-chat data, the code for training Baize and the code for chat model demos. Baize is released for research use only, and commercial use is strictly prohibited. The Baize demo can be hosted on a local machine or accessed online, fetching the LLaMA model and LoRA weights from the Hugging Face model hub, with a user-friendly Gradio interface for chatting.

To run Baize locally, users must ensure that Python version 3.8 is installed and the required packages are installed using the command given in the article. Baize requires a GPU with specific VRAM requirements for Inference without int8, as listed in the article. Users with smaller VRAM GPUs can perform inference with int8 by passing the 8bit argument.

To reproduce Baize, users can collect data from ChatGPT or use the released data, and preprocess the data using the commands given in the article. If a specific dataset is required for self-chatting, users can modify the collect.py script to load their own data. The fine-tuning code is designed to run on an A100–80G GPU and accepts three parameters: foundation model size (i.e., 7B, 13B, or 30B), batch size, learning rate, and datasets. The batch size here is the per-device batch size before gradient accumulation, and users can set it to a smaller value if training on a GPU with smaller VRAM.

In conclusion, Baize is an open-source chat model trained using LoRA that has been released for research purposes only. The article provides users with the necessary code and commands to run Baize locally and reproduce the model. The article also highlights the VRAM requirements for inference and training for each model size.

4. Opt-Me-Out From Diffusion: This AI Model Can Remove Copyrighted Concepts from Text-to-Image Diffusion Models

Paper link : www.arxiv.org/abs/2303.13516

Github link: www.github.com/nupurkmr9/concept-ablation

Opt-Me-Out from Diffusion

Researchers from Carnegie Mellon University, Tsinghua University, and Adobe have developed an algorithm that can remove copyrighted or memorized concepts from pre-trained text-to-image diffusion models. These models are designed to generate images from text prompts, and can replicate styles of artists and memorize exact training samples. However, they are typically trained on a large amount of data that may contain copyrighted materials, licensed images, and personal photos. The algorithm proposed by the researchers ablates or removes target concepts by fine-tuning the model to have the same prediction given the prompt with and without the target concept, effectively switching the target distribution to an anchor distribution. The anchor distribution represents a closely related concept that should be preserved.

The researchers tested their method on various concept ablation tasks, such as artistic styles, object instances, and memorized images. The results showed that the proposed method can successfully remove the target concept while preserving related concepts. For instance, when ablating the Grumpy Cat instance, other cat breeds were still preserved. The researchers note that the method is efficient and does not require retraining the model from scratch.

To use the proposed algorithm, the researchers provide instructions for setting up the required environment, downloading pre-trained models and datasets, training the model with a specific target concept, and sampling images from the trained models. For each concept ablation, training takes some time to generate images. Users are required to provide details such as concept type, caption target, and prompts, among others. Optional parameters are also provided to modify the finetuning process.

In summary, the proposed algorithm provides an efficient method of removing copyrighted or memorized concepts from pre-trained text-to-image diffusion models. The method works by switching the target distribution to an anchor distribution, preserving closely related concepts. The algorithm has potential applications in improving the reliability and fairness of text-to-image models, especially in applications that involve generating images from copyrighted or sensitive data.

Thanks for reading, as usual, you can reach me out on linkedIn and subscribe to my YouTube channel for a video explanation. See you next week.

--

--

Aankur Bhatia

Aankur works as the Chief Data Scientist for a large multinational company. Passionate about application of ML to Cyber Security and holds over 20 patents