AI Digest Weekly — Top AI Research Papers Summary

Aankur Bhatia
7 min readMar 8


Week, Feb 26th- Mar 04th

  1. LLaMA: Open and Efficient Foundation Language Models

LLaMA is a set of open-sourced language models developed by Meta that are optimized to perform well at different budget levels. There are four different models ranging from 7 billion to 65 billion parameters, which are 10 times smaller than OpenAI’s GPT-3. The models are distributed under a non-commercial license, democratizing AI and allowing it to run on a single CPU. The data sources used for training include English CCNet, Github, Wikipedia, Books, ArXiv, and Stack Exchange. The models were tested on various benchmarks such as BooIQ, WinGrande, OpenbookQA, NaturalQuestions, RealToxicityPrompts, and WinoGender. LLaMA -13, the second smallest model, outperforms GPT-3 on most benchmarks, while LLaMA -65 is competitive with some of the best models like DeepMind Chinchilla-70B and Google PaLM -540B.

2. Deep Reinforcement Learning for Cyber System Defense under Dynamic Adversarial Uncertainties

Autonomous Cyber Defense Framework

Researchers at the Department of Energy, Pacific NW Laboratory, have developed an AI system using Deep Reinforcement Learning (DRL) to create autonomous cyber defense strategies and action recommendations that respond to attackers in custom simulated environments. The challenge of creating such strategies and recommendations lies in incorporating dynamics between attackers and defenders and uncertainties. The AI system uses four DRL neural networks to maximize rewards based on avoiding compromises and reducing network disruption. It utilizes the OpenAI Gym Toolkit with attacker entities based on a subset of 15 techniques and 7 tactics from the Mitre ATT&CK framework. The objective of the system is to go through seven attack chains until they reach the impact and exfiltration phase. The DRL algorithms can be trained under multistage assault profiles by varying skills and persistent levels. The researchers claim that the AI system can stop 95% of attacks in the simulated environment.

3. Optical Transformers (Run a transformer Model with an Optical NW)

Optical Transformers Evaluation, Transformer Architecture

The problem with transformer models is their high energy consumption and latency. Researchers have found that optical networks can provide better efficiency, lower latency, and a much lower cost using analog computing. The study shows that linear operations in transformers can be accurately conducted on real optical hardware despite errors and noise. The energy consumption was orders of magnitude lower than cutting-edge processors, making it a viable solution for running large-scale linear transformer models.

4. A high-performance speech neuro-prosthesis : Brain Computer Interface (BCI)

Speech Neuro Prosthesis

Stanford has developed a high-performance speech neuro-prosthesis using a Brain-Computer Interface (BCI) to help people with speech disorders or those who have lost the ability to speak. The BCI uses a recurrent neural network (RNN) to synthesize speech signals found in the patient’s brain, allowing them to communicate at 62 words per minute. Intracortical microelectrode arrays implanted in the patients’ brains record signals at a single neuron resolution, which are then transferred to a gated recurrent unit (GRU) to decode the speech. The system has an error rate of 9.1%.

5. RealFusion : 360 degrees Reconstruction of any object from a single Image

Multi-modal Image Reconstruction

Researchers from Oxford University have created a Diffusion model called RealFusion that can reconstruct different objects in 360 degrees from a single image. Off-the-shelf Diffusion models are only trained on 2D images and lack special supervision for 3D reconstruction. RealFusion uses CO3D datasets and ‘Neural Radiance’ to extract 3D information from an image, incorporating an additional regularizer to smooth out the reconstructed surface. The method has been successful in creating plausible 3D reconstructions of objects captured in the wild.

6. Unlocking the secrets of Deep Learning with Tensorleap’s explainability Platform

Tensorleap is a deep learning explainability platform that can troubleshoot, debug, and visualize a neural network (NN). Deep learning explainability is currently inadequate, particularly in the presence of unrepresented samples or categories. While Deeplift and Alibi are some of the available tools for deep learning interpretability, they are computationally expensive and focus on local explainability. Tensorleap provides population visualization, deep unit testing, and guided error analysis to identify and fix errors in a model, visualize network errors, and automatically detect features. Tensorleap’s population analysis removes erroneous data, and deep unit testing allows for the creation of multiple unit tests based on model features and samples, identifying potential clusters or abnormalities in model features and data.

7. DC- Check : A Data Centric AI Checklist to guide development of Reliable Machine Learning Systems

Data-Centric Checklist

Researchers from the University of Cambridge and UCLA have developed a new data-centric AI framework called DC-Check, which provides an actionable checklist to guide practitioners in developing reliable machine learning systems. The checklist includes a set of questions and practical tools to help practitioners think critically about data at each stage of the pipeline. The framework focuses on characterizing, evaluating, and monitoring underlying data rather than models. It includes data selection, curation, quality evaluation, synthetic data generation, monitoring, feedback loops, and trustworthiness estimation using uncertainty estimation.

8. Tracr: Compiled Transformers as a Laboratory for Interpretability


DeepMind has developed Tracr, a new tool for addressing the problem of limited methods for producing mechanistic explanations to Transformer models. Tracr, which stands for TRAnsformer Compiler for RASP, is a compiler that converts human-readable code into weights of a neural network. It uses the domain-specific programming language Restricted Access Sequence Processing (RASP) to convert the code into weights for standard transformer decoder architectures, such as GPT. Tracr can be used as a tool for assessing model interpretability or for substituting parts of the model using hard-coded weights, which may improve model performance. The new tool is expected to improve the understanding and transparency of Transformer models, which have been criticized for their lack of interpretability.

9. Composer: Creative and Controllable Image Synthesis with Composable Conditions

Image Reconfiguration Using Composer

Alibaba has introduced Composer, a controllable diffusion model that helps designers have greater control over semantics, form, style, and color when creating real-world applications using text-image models. The model is trained on billions of text-image pairs and uses a multi-diffusion model with a UNet backbone to implement Composer. The model uses composability to smoothly reassemble visceral elements to create new images. It works in two stages: the decomposition phase, where a computer vision algorithm breaks down images into individual representations, and the composition phase, where Composer reconstructs images from representation subsets. With a capacity of 5B parameters, the model has the ability to create creative and controllable image synthesis.

10. PRIMEQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

PRIMEQA Application Architecture

PRIMEQA is an open-source library designed by IBM Research AI for multilingual question answering research and development. It aims to facilitate reproducibility and reusability of QA models by providing state-of-the-art retrievers and readers pre-trained models. The library is built on top of open-source NLP libraries and provides simple python scripts as entry points to reuse core components. It supports various tasks such as information retrieval, reading comprehension, question generation, and question answering.

11. Enabling Conversational Interaction with Mobile UI using Large Language Models

Four UI Modeling Tasks

The paper from Google Research and University of Toronto, Canada explores the feasibility of using a single large language model (LLM) to enable conversational interaction with mobile user interfaces (UIs). The current problem requires multiple pretrained task-specific language models for various UI tasks with natural language. The study provides a lightweight and generalizable approach to enable language-based mobile interaction. The paper focuses on four UI modeling tasks: screen question-generation, screen summarization, screen question-answering, and mapping instructions to UI action. Additionally, the paper discusses prompting techniques for adapting LLMs to mobile UI.

12. Chain of Hindsight Aligns Language Models with Feedback

Chain of Hindsight Multiple Models

UC Berkeley developed “Chain of Hindsight,” a technique that allows language models to learn from any form of feedback. Reinforcement learning with human feedback (RLFH) is challenging to optimize because it works on a reward function basis. Chain of Hindsight turns all feedback into sentences, which the model uses to fine-tune its understanding of feedback. During training, the technique randomly selects one or more of the model’s outputs, utilizes them to construct a sentence using both positive and negative feedback. In experiments on summarization and dialog tasks, Chain of Hindsight outperformed all baseline models using supervised fine-tuning (SFT) and RLFH.

For a video explanation, please visit the following link and subscribe to my YouTube channel

AI-ML News and Research Papers, Week Feb 26th-Mar04th



Aankur Bhatia

Aankur works as the Chief Data Scientist for a large multinational company. Passionate about application of ML to Cyber Security and holds over 20 patents