Graphs-of-Heads - The Second Literature Review

In this second part, we will focus mainly on GaCLLM (Du et al., 2024) and my new take on the system I imagined after digging deeper the work.

Read the previous part here: Graphs-of-Heads - The First Literature Review.

GaCLLM

In this work, authors proposed 4 main modules:

Conduct SFT (Supervised Fine-tuning) for the LLM
Propose LLM-based graph-aware convolutional inference
Align and fuse generated description into graph-based embeddings
Introduce the objective function and learning process

Supervised Fine-tuning

In first stage, they fine-tune LLM with domain-specific data. This involves training of LLM with descriptions from matched user-item pairs. They will construct the prompt template for items. The optimization process involves minimizing the negative log-likelihood loss for these templates.

They adopt LoRa (Hu et al., 2022) for parameter-efficient learning to improve time and computational efficiency.

LLM-based Convolutional Inference in Graph

My understanding about proposed graphs.

The graph will be $G = (V, E)$ with $V$ are the users and items. The edges $E$ are constructed by the interactions between users and items. Each node has a textual description.

We have prompt template to rewrite/generate the description of users and items. Let call it $PROMPT_{user}$ and $PROMPT_{item}$.

To improve the quality of descriptions for users, we will rewrite user’s raw description $T_u$ by,

\[T'_u = LLM(PROMPT_{user}(T_u, \{T_i:(u,i) \in E\})),\]

Similarity, to improve the quality of descriptions for items, we will rewrite user’s raw description $T_i$ by,

\[T'_i = LLM(PROMPT_{item}(T_i, \{T_u:(u,i) \in E\})),\]

To do this, they enhance the layers by running LLM in a loop with first layer is the raw input. LLM now is taking the role as an “aggregator” in the graph convolutional processing. After $L$ interations, we obtain different layers of user and item descriptions for both users and items.

Different ways for token usage in LLMs (Du et al., 2024).

They claim the proposed LLM-based convolutional strategy has 2 main strengths:

High quality descriptions of users and items in a least-to-most manner
Effectively capture graph-related info with limited context of length
- Break down the task of description enhancement into multiple steps
- Each step (layer) only integrates the descriptions of one-decrease in the number of tokens needed
Efficiently alleviate the redundancy of describing the graph for target nodes

Aligning GCN-based Embeddings for Recommendation

To bridge text info and structural info, they align the descriptions of users and items with their embeddings in a unified manner.

First layer represents the initial step for both users’ and items’ representation. Each user and item will associated with the embedding from a specific ID. Then they encode them into the constant text-based embedding by encoder.

There will be an objective function to measure the matching scores between users and items for final predictions.

Some of Thoughts

I am still confused and not clarify my idea yet. Do you have any thoughts? But I guess re-enact the work of GaCLLM will be interesting.

This afternoon I will have a meeting with my team to discuss about this, I will update the progress in upcoming posts.