[论文阅读笔记]2021_RecSys_Together is Better: Hybrid Recommendations Combining Graph Embeddings and Contextualized Word Representations

论文下载地址： https://doi.org/10.1145/3460231.3474272
发表期刊：RecSys
Publish time: 2021
作者及单位:

Marco Polignano marco.polignano@uniba.it University of Bari Aldo Moro Bari, Apulia, Italy
Cataldo Musto cataldo.musto@uniba.it University of Bari Aldo Moro Bari, Apulia, Italy
Marco de Gemmis marco.degemmis@uniba.it University of Bari Aldo Moro Bari, Apulia, Italy
Pasquale Lops pasquale.lops@uniba.it University of Bari Aldo Moro Bari, Apulia, Italy
Giovanni Semeraro giovanni.semeraro@uniba.it University of Bari Aldo Moro Bari, Apulia, Italy

数据集：

Movielens 1M (ML1M) http://grouplens.org/datasets/movielens/
$DB_{Book}$ dataset http://challenges.2014.eswc-conferences.org/index.php/RecSys
Generally speaking, ML1M is larger than DBbook, but it is less sparse so it is more suitable for collaborative filtering-based algorithms. On the other side, DBbook is sparser and unbalanced towards negative preferences (only 45.85% of positive ratings), and this makes the recommendation task very challenging. (一般来说，ML1M比DBbook大，但稀疏性较小，因此更适合基于协同过滤的算法。另一方面，DBbook比较稀疏且在负面偏好方面不平衡（只有45.85%的正面评级），这使得推荐任务非常具有挑战性。)

代码：

The source code of our hybrid recommendation algorithm https://github.com/ANONYMOUS (作者在论文中公开的)
NeuRec library https://github.com/wubinzzu/NeuRec
Elliot framework https://github.com/sisinflab/elliot

其他：

Amazon’s revenues are generated through its recommendation engine http://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers

其他人写的文章

简要概括创新点：全文太长了，写得很啰嗦，一个点要翻来覆去讲个三四次。最大的idea/novel如文章题目。就是让2种框架同时运行，然后作者尝试了很多搭配和策略，对结果进行了讨论

(1)We introduce a recommendation framework inspired by deep architectures, that learns a hybrid representation of users and items based on collaborative and content-based features; (我们引入了一个受深层架构启发的推荐框架，该框架学习基于协作和基于内容的功能的用户和项目的混合表示；)

(2)We design different strategies to combine graph embeddings and contextual word representations into a single embedding; (我们设计了不同的策略，将图形嵌入和上下文词表示结合到一个嵌入中；)

(1) we propose a hybrid recommendation framework that combines together collaborative features based on graph embeddings and content-based features based on contextual word representations. 我们提出了一个混合推荐框架，将基于图形嵌入的协作功能和基于上下文词表示的基于内容的功能结合起来

(1)In this paper we presented a hybrid recommendation framework based on the combination of graph embeddings and contextual word representations. (本文提出了一种基于图嵌入和上下文词表示相结合的混合推荐框架。)

(2)In particular, we evaluated the possibility of combining graph-based (i.e., TransH, TransE) and contextual word representation (i.e., BERT, USE) techniques through two concatenation strategies: entity-based and feature-based. (特别是，我们评估了通过两种连接策略将基于图形（即TransH、TransE）和上下文词表示（即BERT、USE）技术相结合的可能性：基于实体和基于特征。)
In the first one, we followed the intuition that by concatenating more representations of the same entity in the first level of the network, we could help the neural model in better generalizing the intrinsic characteristics of the entity. (在第一个例子中，我们遵循直觉，通过在网络的第一级连接同一实体的更多表示，我们可以帮助神经模型更好地概括实体的内在特征。)

In the feature-based approach, we instead tried to show that concatenating homogeneous representations of users and items (i.e., created with the same representation technique) could help the network better learn the relationships between these entities. (在基于特征的方法中，我们试图证明，将用户和项目的同质表示（即使用相同的表示技术创建）连接起来，可以帮助网络更好地了解这些实体之间的关系。)

ABSTRACT

In this paper, we present a hybrid recommendation framework based on the combination of graph embeddings and contextual word representations. (在本文中，我们提出了一个基于图形嵌入和上下文词表示相结合的混合推荐框架)
Our approach is based on the intuition that each of the above mentioned representation models heterogeneous (and equally important) information, that is worth to be taken into account to generate a recommendation. (我们的方法基于这样一种直觉，即上面提到的每一种表示都模拟了异构（而且同样重要）的信息，在生成建议时值得考虑这些信息。)
Accordingly, we propose a strategy to combine both the features, which is based on the following steps: (因此，我们提出了一种结合这两种功能的策略，该策略基于以下步骤：)
- first, we separately generate graph embeddings and contextual word representations by exploiting state-of-the-art techniques. (首先，我们利用最先进的技术分别生成图形嵌入和上下文单词表示。)
- Next, these embeddings are used to feed a deep architecture that learns a hybrid representation based on the combination of the single groups of features. (这些嵌入用于为深层架构提供信息，该架构基于单个功能组的组合学习混合表示。)
- Finally, we exploit the resulting embedding to identify suitable recommendations. (我们利用由此产生的嵌入来确定合适的建议。)
In the experimental session, we evaluate the effectiveness of our strategy on two datasets and results show that the use of a hybrid representation leads to an improvement of the predictive accuracy. Moreover, our approach overcomes several competitive baselines, thus confirming the validity of this work. (在实验环节中，我们在两个数据集上评估了我们的策略的有效性，结果表明使用混合表示可以提高预测精度。此外，我们的方法克服了几个竞争基线，从而证实了这项工作的有效性。)

CCS CONCEPTS

• Information systems → Recommender systems; • Computing methodologies → Natural language processing.

KEYWORDS

graph embeddings, BERT embeddings, USE embeddings, deep learning, recommender systems

1 INTRODUCTION

(1)The ever-increasing amount of data and information available on the web makes it always more important to develop systems that efficiently support users in finding and consuming only relevant elements. Among the different solutions for dealing with this task, recommender systems (RS) emerged over the years. Their effectiveness is confirmed by several shreds of evidence: as an example, 35% of Amazon’s revenues are generated through its recommendation engine1, and many companies frequently report claims that such systems contribute from 10% to 30% of total revenues [18]. Similar success stories regarding other web companies such as Facebook, Spotify and Netflix further confirm that RS nowadays play a central role to improve the effectiveness of web platforms. (网络上可用的数据和信息量不断增加，这使得开发能够有效支持用户仅查找和使用相关元素的系统变得越来越重要。在处理这项任务的不同解决方案中，推荐系统（RS）是近年来出现的。它们的有效性得到了几点证据的证实：例如，亚马逊35%的收入是通过其推荐引擎1产生的，许多公司经常报告称，此类系统占总收入的10%到30%[18]。Facebook、Spotify和Netflix等其他网络公司的类似成功案例进一步证实，如今，RS在提高网络平台效率方面发挥着核心作用。)
(2)In short, RS use the preferences expressed by a user, explicitly or implicitly, to filter the available information and proactively propose elements that might be of interest for her. Several recommendation models have been proposed in the scientific literature. Most of them have been developed by using the so-called wisdom of the crowd. In particular, collaborative filtering approaches [27] are based on the idea that by observing the activities of a group of users similar to the active one, it is possible to identify a set of recommendations relevant to her. Instead, content-based approaches [11,34] deal with user and product characteristics to identify the set of properties that are of interest for the user. Hybrid approaches [6, 39] try to put together the ideas behind collaborative and content-based RSs. In particular, they use both the wisdom of the crowd and the properties of users and products with a comprehensive approach. Hybrid recommender systems combine the best features of content-based and collaborative and deliver highly accurate results. However, one of the main issues related to the development of hybrid recommendation models concerns the representation of both content-based and collaborative features. How can we represent (and combine) both these groups of features in a hybrid recommender system? (简言之，RS使用用户明示或暗示的偏好来过滤可用信息，并主动提出她可能感兴趣的元素。科学文献中已经提出了几种推荐模型。其中大多数都是利用所谓的群众智慧开发出来的。尤其是，协作过滤方法[27]基于这样一种理念，即通过观察一组与活跃用户相似的用户的活动，可以确定一组与她相关的建议。相反，基于内容的方法[11,34]处理用户和产品特征，以确定用户感兴趣的属性集。混合方法[6,39]试图将协作式RSs和基于内容的RSs背后的理念整合在一起。特别是，他们综合运用了人群的智慧以及用户和产品的特性。混合推荐系统结合了基于内容和协作的最佳功能，并提供高度准确的结果。然而，与混合推荐模型开发相关的主要问题之一涉及基于内容和协作特征的表示。我们如何在混合推荐系统中表示（并结合）这两组功能？)
(3)In this work, we address such a research question and we propose a hybrid recommendation model based on the combination of graph embeddings and contextual word representation. (在这项工作中，我们解决了这样一个研究问题，并提出了一个基于图形嵌入和上下文词表示相结合的混合推荐模型。)
- (1)The first are based on the use of graphs as a tool to represent products, concepts and information including their properties and relationships, and showed very good performance in several machine learning tasks [16]. (第一种是基于使用图形作为工具来表示产品、概念和信息，包括它们的属性和关系，并在几个机器学习任务中表现出非常好的性能[16]。)
  - Roughly speaking, graph embedding techniques [7] take a graph as input and produce a vector-space representation of each node as output. By referring to a recommendation scenario, they produce a vectorial representation of users and items such that users sharing similar preferences (and items voted by the same users, respectively) are close in the space. (粗略地说，图嵌入技术[7]将图作为输入，并生成每个节点的向量空间表示作为输出。通过引用一个推荐场景，它们生成了用户和项目的矢量表示，这样共享相似偏好（以及分别由相同用户投票的项目）的用户在空间中就很接近了。)
- (2)The second is based on the idea of using machine learning as a tool to create compact numerical representations of terms and emerge as an evolution of widespread word embedding (WE) techniques such as Word2Vec [29]. (第二种是基于使用机器学习作为工具来创建术语的紧凑数字表示的想法，并作为广泛的单词嵌入（WE）技术（如Word2Vec[29]）的发展而出现。)
  - In particular, recent contextual word representation [40] learn a vector space representation based on word usage in large corpora of textual data, and produce a representation such that terms sharing a similar meaning are close in the space. However, differently from static WE techniques, contextual word representations take into account the context in which a word is used. In particular, such techniques allow generating a different vector representation of the word depending on the terms that co-occur with it in the specific sentence we are going to encode. This process is possible because of the use of large pre-trained language models, which can learn highly transferable and task-agnostic properties of a language [14]. As shown by several work, these techniques obtained outstanding results in many natural language processing-related scenarios [44]. (特别是，最近的上下文词表示[40]基于文本数据的大型语料库中的词用法学习向量空间表示，并产生一种表示，使得具有相似含义的词在空间中接近。然而，与静态WE技术不同的是，上下文单词表示考虑了单词使用的上下文。特别是，这些技术允许根据我们将要编码的特定句子中与单词同时出现的术语，生成单词的不同向量表示。这一过程之所以可能，是因为使用了大量预先训练好的语言模型，可以学习语言的高度可转移性和任务无关性[14]。几项研究表明，这些技术在许多与自然语言处理相关的场景中都取得了显著的效果[44]。)
(4)In this work we tried to take the best out of these techniques since we combined them into a hybrid recommendation model. Our methodology is based on the following steps: (在这项工作中，我们试图充分利用这些技术，因为我们将它们组合成了一个混合推荐模型。我们的方法基于以下步骤：)
- first, graph embeddings and contextual word representations are separately generated by exploiting the previously mentioned techniques. (首先，利用前面提到的技术分别生成图形嵌入和上下文单词表示。)
- Next, such pre-trained embeddings are used to feed a deep architecture that combines the single groups of features in a unique hybrid representation through a concatenation layer. Finally, the resulting representation is exploited to identify suitable recommendations. (接下来，这些预先训练好的嵌入被用来提供一个深层的体系结构，该体系结构通过连接层将单个功能组组合在一个独特的混合表示中。最后，利用结果表示来确定合适的建议。)
- The novelty of the proposed approach lies:
  - (i) the combination of graph embedding and contextual word representations in a single recommendation model. In particular, the adoption of contextual word representation is a particularly new contribution in the area of hybrid recommender systems; (在单个推荐模型中结合图形嵌入和上下文词表示。特别是，在混合推荐系统领域，上下文词表示的采用是一个特别新的贡献；)
  - (ii) the analysis of the effectiveness of different strategies to concatenate item and user representations in a hybrid recommendation model. Another strong point of the proposed methodology lies in the adoption of pre-trained embeddings learnt by exploiting exogenous state-of-the-art techniques. In this way, we can lighten the load on our deep architecture, reduce training time and increase the scalability of the model. (分析在混合推荐模型中连接项目和用户表示的不同策略的有效性。所提出的方法的另一个优点在于采用通过利用外部最先进技术学习的预先训练的嵌入。通过这种方式，我们可以减轻深层架构的负载，减少训练时间，并增加模型的可伸缩性。)
(5)In the experimental session, we evaluate the effectiveness of our recommendation strategy on two different datasets and the results show that the combination of graph embeddings and contextual word representations leads to an improvement of the predictive accuracy. Moreover, our approach overcomes several competitive baselines, thus confirming the validity of the intuitions behind this work. (在实验环节中，我们在两个不同的数据集上评估了我们的推荐策略的有效性，结果表明，图形嵌入和上下文单词表示的结合提高了预测精度。此外，我们的方法胜过了几个竞争基线，从而证实了这项工作背后直觉的有效性。)
(6)To sum up, the contributions of this article can be summarized as follows:
- We introduce a recommendation framework inspired by deep architectures, that learns a hybrid representation of users and items based on collaborative and content-based features; (我们引入了一个受深层架构启发的推荐框架，该框架学习基于协作和基于内容的功能的用户和项目的混合表示；)
- We design different strategies to combine graph embeddings and contextual word representations into a single embedding; (我们设计了不同的策略，将图形嵌入和上下文词表示结合到一个嵌入中；)
- We evaluate our strategy against two datasets, by ensuring the reproducibility of the whole experimental protocol.

The rest of the article is organized as follows: Section 2 describes related work; Section 3 presents the adopted methodology; Section
4 discusses the findings emerging from the experimental evaluation, while conclusions and future work are sketched in Section 5.

2 RELATED WORK

(1)Recommender systems [37] represent one of the most disrupting technologies that appeared on the scene in the last decade [23]. Basically, such systems acquire information about user needs, interests and preferences and tailor their behavior based on such
information, thus supporting people in several decision-making tasks [35, 36]. (推荐系统[37]是过去十年中出现的最具破坏性的技术之一[23]。基本上，这样的系统获取有关用户需求、兴趣和偏好的信息，并根据这些信息定制他们的行为，从而支持人们完成多项决策任务[35,36]。)
(2)The current state of the art is the consequence of a very long path: indeed, early research was characterized by the sharp dichotomy between collaborative filtering algorithms [27], based on the analysis of the ratings expressed by the users, and content-based techniques [11], which exploited textual information (e.g., the plot of a movie) and descriptive features of the items. Subsequently, hybrid RSs [6] emerged as a mean to combine both the approaches. Indeed, they are based on the intuition that content-based and collaborative features provide heterogeneous and complementary evidence, and can equally contribute to generate accurate recommendations. In this work, we fall into this research line and we propose a hybrid recommendation framework that combines together collaborative features based on graph embeddings and content-based features based on contextual word representations. (目前的技术水平是一条非常漫长的道路的结果：事实上，早期研究的特点是，基于用户表达的评分分析的 协同过滤算法 [27]与 基于内容 的技术[11]之间存在着尖锐的二分法，它利用了文本信息（如电影情节）和项目的描述性特征。随后，混合RSs[6]成为结合这两种方法的一种手段。事实上，它们基于这样一种直觉，即基于内容和协作的功能提供了异构和互补的证据，并且同样有助于生成准确的建议。在这项工作中，我们属于这一研究方向，我们提出了一个混合推荐框架，将基于图形嵌入的协作功能和基于上下文词表示的基于内容的功能结合起来。)
(3)As shown by several shreds of evidence (see [17, 30, 53], just to name a few), recommendation models exploiting graph embedding techniques outperform typical baselines. Based on the findings of previous research [33], this work exploits translation models such as TransE [5] and TransH [49]. More details about the algorithms will be provided next. （一些证据表明（仅举几个例子，见[17,30,53]），利用图嵌入技术的推荐模型优于典型基线。基于之前研究[33]的发现，这项工作利用了TransE[5]和TransH[49]等翻译模型。接下来将提供有关算法的更多详细信息。）
(4)In parallel, contextual word representations [40] emerge as an evolution of widespread word embedding (WE) techniques. Even if WE techniques have been largely used to provide users with recommendations [32], the adoption of contextual word representations such as BERT [13], Elmo [4], Universal Sentence Encoder (USE) [9] in recommendation tasks is poorly investigated. As an example, an approach that exploits ELMo embeddings based on item features and user reviews is presented in [42], while the exploitation of BERT embeddings in a recommendation task is discussed in [8]. In this work, USE and BERT were chosen as contextual word representation methods since they emerged as the best-performing techniques in [19], where a benchmark for a paper RS is presented. (同时，语境词表征[40]也随着广泛的 单词嵌入（WE）技术 的发展而出现。即使WE技术已被大量用于向用户提供推荐[32]，但在推荐任务中采用上下文单词表示法，如BERT[13]、Elmo[4]、Universal Sequence Encoder（USE）[9]的情况也没有得到很好的研究。例如，在[42]中介绍了一种利用基于项目特征和用户评论的ELMo嵌入的方法，而在推荐任务中利用BERT嵌入的方法在[8]中进行了讨论。在这项工作中，USE和BERT被选为上下文词表示方法，因为它们在[19]中是表现最好的技术，其中给出了一篇论文的基准。)
(5)With respect to the above mentioned literature, that separately exploits graph embeddings and contextual word representation, in this work we proposed a strategy to learn a hybrid representation that combines both the sources of information through a deep architecture. (关于上述分别利用图形嵌入和上下文词表示的文献，在这项工作中，我们提出了一种学习混合表示的策略，通过深层架构将两种信息源结合起来。)
(6)However, the idea of exploiting deep architectures to learn a vector-space representation of users and items is not completely new. As an example, a deep architecture for Neural Collaborative Filtering, which applies the principles of deep learning to the collaborative filtering framework is presented in [21]. However, differently from [21], which is only based on collaborative features, we also encoded in the model content-based features coming from contextual word representations. (然而，利用深层架构来学习用户和项目的向量空间表示并不是一个全新的想法。作为一个例子，在[21]中介绍了一个用于 神经协同过滤 的深层架构，它将深度学习的原理应用于协同过滤框架。然而，与[21]不同的是，它只基于协作特征，我们还在模型中编码了来自上下文单词表示的基于内容的特征。)
(7)Another distinctive trait of our work lies in the adoption of a concatenation layer to combine the different groups of features in a deep architecture. This intuition, which is also widely adopted in multi-modal deep learning [41], was also investigated in [21]. Another similar architecture is presented in [31]. In this case, the model learns a representation for textual content by exploiting Recurrent Neural Networks. With respect to this work, we preferred to exploit pre-trained embeddings learnt by BERT, which showed to overcome other techniques in several tasks [44]. Finally, our work shares some ideas with those presented in [52]. (我们工作的另一个显著特点是采用了一个连接层，将不同的功能组组合在一个深层架构中。这种直觉也被广泛应用于多模式深度学习[41]，在[21]中也进行了研究。[31]中介绍了另一种类似的体系结构。在这种情况下，模型通过利用递归神经网络学习文本内容的表示。关于这项工作，我们更倾向于利用BERT学习到的经过预训练的嵌入，这表明在几个任务中可以克服其他技术[44]。最后，我们的工作与[52]中提出的工作分享了一些想法。)
(8)As for the attempts to combine graph-based and word embeddings, Wang et al. [46] propose to merge semantic-level and knowledge-level representations of news through their knowledge-aware convolutional neural network (KCNN). Results show that the usage of graph-based entity embedding and contextual embedding can improve AUC. The same authors recently investigated [47] the use of graph embeddings in a recommendation scenario based on many state-of-the-art datasets (i.e., MovieLens-20M, BookCrossing, and Last.FM, Dianping-Food). Also in this case, their Graph Neural Networks with Label Smoothness Regularization demonstrated the effectiveness of graph embedding in RSs. Similarly, Cenikj et al. in [8] propose to enhance a graph-based RS for Amazon products with two state-of-the-art representation models: BERT and GraphSage. The results obtained by the authors encourage to follow the idea to merge graph and contextual word representation techniques. With respect to these pieces of work, we considered more contextual word representation techniques (i.e., BERT and USE) and different graph embedding techniques i.e., TransE and TransH). This allows to better assess the impact of the single methodologies on the overall accuracy of the models and to get more precise takehome messages concerning the adoption of such techniques and the effectiveness of their combination. (至于结合基于图形和单词嵌入的尝试，Wang等人[46]提出通过 知识感知卷积神经网络（KCNN） 将新闻的语义级和知识级表示合并。结果表明，使用基于图形的实体嵌入和上下文嵌入可以提高AUC。这两位作者最近调查了[47]基于许多最先进的数据集（即MovieLens-20M、BookCrossing和Last.FM、Dianping Food）的推荐场景中使用图形嵌入的情况。同样在这种情况下，他们的 带有标签平滑正则化的图神经网络 证明了在RSs中嵌入图的有效性。类似地，Cenikj等人在[8]中提出用两种最先进的表示模型：BERT 和 GraphSage 来增强亚马逊产品基于图形的RS。作者获得的结果鼓励遵循将图形和上下文词表示技术相结合的想法。关于这些工作，我们考虑了更多的上下文词表示技术（如 BERT 和 USE ）和不同的图形嵌入技术（如 TransE 和 TransH ）。这使我们能够更好地评估单一方法对模型整体准确性的影响，并获得有关采用此类技术及其组合有效性的更准确的信息。)

3 METHODOLOGY

In this section we describe our methodology to learn a hybrid representation of users and items. We first introduce some basics of our recommendation framework. Next, we describe the deep architecture that combines such embeddings to provide users with recommendations. (在本节中，我们将介绍学习用户和项目混合表示的方法。我们首先介绍推荐框架的一些基础知识。接下来，我们将描述结合此类嵌入为用户提供建议的深层架构。)

3.1 Basics of the Recommendation Framework

(1)Given a set of users $U = \{u_1. . .u_m\}$ and a set of items $I = \{i_1. . .i_n\}$ , the goal of the framework is to build a hybrid vector-pace representation for each $\in U$ and for each $\in I$ . Such representations, that can be formally defined as $u→\overrightarrow{u}$ and $i→\overrightarrow{i}$ , are used to feed a deep architecture that identifies suitable recommendations. In particular, each $u→\overrightarrow{u}$ is encoded starting with a collaborative embedding $uc→\overrightarrow{u_c}$ and a content-based embedding $ut→\overrightarrow{u_t}$ , and the same principle holds for the representation of $i→\overrightarrow{i}$ , that can be encoded starting with $ic→\overrightarrow{i_c}$ and $it→\overrightarrow{i_t}$ . In our recommendation framework, these representations will be concatenated by using different strategies.

3.1.1 Graph Embedding techniques

(1)Vectors $ic→\overrightarrow{i_c}$
and $uc→\overrightarrow{u_c}$ encode the collaborative information which can be obtained by mining the graph connecting the users to the items they like. Accordingly, we first create $G = (N, E)$ , where $\cup U$ , and $E = \{e_1. . .e_k\}$ . $E$ encodes the available ratings, so each $e_i$ connects a $\in U$ to a $\in I$ , based on the preferences expressed by $u$ . Of course, we only represent positive ratings. Next, a graph embedding technique is run to represent nodes (that is to say, users and items) encoded in the above described graph as vectors. Due to space reasons, we can’t go into details of all the available methods, and we suggest to refer to [7] for a comprehensive survey. (当然，我们只代表正面评价。接下来，运行图嵌入技术以将编码在上述图中的节点（即用户和项目）表示为向量。由于篇幅原因，我们无法详细介绍所有可用的方法，建议参考[7]进行全面调查。)
(2)In this work we based on the findings of previous research [33], and we investigatetranslation modelssuch as TransE [5] and TransH [49]. These models embed nodes and relations by seeing relations as translations in the vector space. It should be pointed out that, in our case, the only relation that is encoded in the graph is the feedback relation, used to connect the users to the items they like. (在这项工作中，我们基于先前研究[33]的发现，研究了TransE[5]和TransH[49]等翻译模型。这些模型通过将关系视为向量空间中的平移来嵌入节点和关系。应该指出的是，在我们的例子中，图表中唯一编码的关系是反馈关系，用于将用户连接到他们喜欢的项目。)
(3)In order to build a vectorial representation of nodes, these models need to be trained. Training is carried out by providing as input some positive triples, expressed as $(u, f, i)$ , where $u$ and $i$ are nodes and $f$ is a relation. Each triple represents a fact that exists in the graph, thus, in our specific setting, for each rating $\in R$ indicating that user $u$ expressed a feedback $f$ on item $i$ , a triple is included in the training set.
- In parallel, a set of negative triples (that is to say, facts that do not exist in the graph) is built by using negative sampling strategy [29]. (同时，通过使用负采样策略建立一组负三元组（也就是说，图中不存在的事实）[29]。)
- Next, models are trained by minimizing a pairwise ranking loss function, which compares the scores (as defined below) of positive triples to the scores of negative ones. Given this general setting, TransE learns a representation of entities and relations so that $u→+f→≈i→\overrightarrow{u} + \overrightarrow{f} \approx \overrightarrow{i}$ . This is done by using as scoring function the formula $f (u, f, i) = D (u + f, i)$ , where $D$ is a distance function such as the $L 1$ or the $L 2$ norm.
- The second graph embedding technique we employed is TransH, which is an extension of TransE that can more effectively deal with one-to-many and many-to-many relations. By considering that in a recommendation setting each user typically rates more than one item (and viceversa), we can assume that TransH will likely produce a more precise representation of users and items. Formally, this is obtained by projecting entities on a hyperplane identified by the normal vector $w_u$ . Accordingly, the score function becomes: $\bot + f ,i \bot)$ , where $\bot = u −w^T_f u w_f$ and $\bot = i − w^T_f i w_f$ and $D$ is a distance function such as the $L 1$ or the $L 2$ norm.
(4)As shown in [33], TransE and TransH can provide similar results. However, results may be dependant on the datasets, so we evaluated the effectiveness of the embeddings built by exploiting both techniques. For further details about TransE and TransH we suggest again to the original papers describing the methods ([5,49]). (如[33]所示，TransE和TransH可以提供类似的结果。然而，结果可能取决于数据集，因此我们评估了利用这两种技术构建的嵌入的有效性。关于TransE和TransH的更多细节，我们再次建议参考描述方法的原始文件（[5,49]）。)

3.1.2 Contextual Word Representations

(1)The second part of the representation is obtained by learning $it→\overrightarrow{i_t}$
and $ut→\overrightarrow{u_t}$ . These vectors encode content-based information which are obtained by processing descriptive features of the item, such as the plot of a movie. So, we assume that for each item $\in I$ some textual content $t e x t (i)$ is available (e.g,plot of a movie). We decided to encode such textual information in a vectorial form that might preserve the meaning of the content feature. As previously stated, we chose BERT[13] and USE [45] as encoding strategies. This choice is motivated by their ability to express syntactic and semantic information of words observed in extensive document collections. Contextual word representations have proven to be a key component of the state-of-the-art natural language processing (NLP) approaches that require text comprehension and formalization. For the sake of brevity, we just provide a synthetic overview of the techniques, but for further details, we suggest to refer to the original papers (i.e., [9, 13]). (我们决定以矢量形式对这些文本信息进行编码，以保留内容特征的含义。如前所述，我们选择BERT[13]并使用[45]作为编码策略。这种选择的动机是他们能够表达在广泛的文档集合中观察到的单词的句法和语义信息。语境词表征已被证明是最先进的自然语言处理（NLP）方法的关键组成部分，这些方法需要文本理解和形式化。为了简洁起见，我们只提供了这些技术的综合概述，但对于进一步的细节，我们建议参考原始论文（即[9,13]）。)
(2) BERT (acronym of Bidirectional Encoder Representations from Transformers) [13] is a language model based on the Transformer’s deep learning architecture [43]. (BERT（Transformers双向编码器表示的首字母缩写）[13]是一种基于Transformer深度学习架构的语言模型[43]。)
- Its base version is designed as a stack of 12 encoding layers, each of which learns an increasingly precise and refined representation. (它的基本版本被设计为一个由12个编码层组成的堆栈，每个编码层学习一个越来越精确和精细的表示。)
- Each layer is composed of a multi-head attention module, a normalization, and a feed-forward layer. (每一层由一个多头注意模块、一个归一化和一个前馈层组成)
- In order to use BERT lo learn $it→\overrightarrow{i_t}$ $i^{t}$ we followed these steps:
  - (i) We pre-processed $t e x t (i)$ by making it lowercase and by removing punctuation, stopwords, and non-ASCII characters; (我们对 $t e x t (i)$ 进行预处理，使其小写，并删除标点符号、停止字和非ASCII字符；)
  - (ii) We truncated the cleaned text we previously obtained to the first 128 words. This is due to a constraint of BERT , that can’t accept longer input sentences; (我们将之前获得的清理文本截断为前128个单词。这是由于 BERT 的限制，不能接受更长的输入句子；)
  - (iii) For each word $\in cleaned_text(i)$ we obtain its representation $bert(w)→\overrightarrow{bert(w)}$ by exploiting pre-trained BERT base model. In particular, we extract the output of the last encoding layer for word $w$ ; (特别是，我们提取单词w的最后一个编码层的输出)
  - (iv) Finally, we obtain the embedding of item $i$ as the centroid vector of the words $w$ . Formally: $it→=∑j=1kbert(wj)→/k\overrightarrow{i_t} = \sum^k_{j=1} \overrightarrow{bert(w_j)} /k$ .
(3)In some cases, a centroid-based representation, as that we built based on BERT, is not the best approximation of the real meaning of a sentence. To tackle this problem, it is necessary to exploit deep neural models that learn a sentence-based representation, such as USE (Universal Sentence Encoder Model). (在某些情况下，基于质心的表示，正如我们基于BERT建立的表示，并不是句子真正意义的最佳近似。为了解决这个问题，有必要利用深层神经模型来学习基于句子的表示，例如USE（通用句子编码模型）。)
- USE is typically trained on Wikipedia by using a modified Skip-thought strategy [26]. The idea behind this approach is to predict the previous and the next sentence starting from the current one as input. In order to learn the model, each sentence s in the training set is used as input of a Transformer made of six transformers layers, each of which has a self-attention module followed by a feed-forward network. The model is the best solution for encoding sentences of different lengths without truncating or padding them. The last layer obtained from the model is a 512-dimensional vector what represents the whole sentence $s→\overrightarrow{s}$ . Once the model is trained, we can use it to map a new sentence into the existing model. In particular, we use $cleaned_text(i)$ as input of the Google USE model, and we obtain as output the item embedding $it→\overrightarrow{i_t}$ based on USE. (使用维基百科通常是通过使用修改后的Skip-thought strategy来训练的[26]。这种方法背后的想法是从当前句子开始预测上一句和下一句作为输入。为了学习模型，训练集中的每个句子s被用作由六个转换层组成的transformer的输入，每个transformer层都有一个自我注意模块，然后是一个前馈网络。该模型是在不截断或填充不同长度句子的情况下对其进行编码的最佳解决方案。从模型中得到的最后一层是一个512维向量，它代表整个句子的 $s→\overrightarrow{s}$ 。一旦模型被训练好，我们就可以用它把一个新句子映射到现有的模型中。特别是，我们使用了 $cleaned_text(i)$ 作为Google USE模型的输入，基于USE 我们得到了 $it→\overrightarrow{i_t}$ 的项嵌入作为输出)
(4)Once item embeddings are available, in both cases we adopted the following strategy to obtain a user profile $ut→\overrightarrow{u_t}$
: In particular, we borrowed from content-based RSs [11] that idea of building the profile as the centroid vector of the items positively rated by the user. Formally, for each user $\in U$ all item embeddings of positively rated items $irated→\overrightarrow{i_{rated}}$ are collected. Next, we calculate $ut→=∑j=1zirated→/z\overrightarrow{u_t} = \sum^z_{j=1} \overrightarrow{i_{rated}} / z$ , in order to build a vectorial representation of users.

3.2 Description of the Architecture

(1)Once the single pre-trained embeddings are available, our deep architecture comes into play. As previously stated, it aims to: (一旦单一的预训练嵌入可用，我们的深层架构就开始发挥作用。如前所述，其目的是：)
- (i) learn a hybrid representation of users and items, based on graph and word embeddings; (学习基于图形和单词嵌入的用户和项目的混合表示；)
- (ii) identify suitable recommendations, by predicting the interest of user $u$ in items $\in I$ . (通过预测用户 $u$ 对第 $i$ 项的兴趣，确定合适的推荐)
(2)In order to also assess about the effectiveness of the single embedding strategies, we first defined a very simple architecture that we exploited as baseline in our experiments. In the following, we refer to this architecture as basic. Our basic architecture is shown in Fig. 1. Specifically, given one of the above mentioned representation techniques (graph embedding or word embedding, alternatively), we proceed through a single concatenation layer to merge the information regarding user and item. At this point, through a dense layer with a sigmoid activation function, we proceed to the prediction step. As shown in the Figure, no combination between different techniques is performed in this case. (为了评估单一嵌入策略的有效性，我们首先定义了一个非常简单的体系结构，并在实验中将其用作基线。在下文中，我们将此架构称为基本架构。我们的基本架构如图1所示。具体地说，给定上述表示技术之一（图形嵌入或单词嵌入），我们将通过单个连接层合并有关用户和项目的信息。在这一点上，通过一个具有sigmoid激活函数的致密层，我们进入预测步骤。如图所示，在这种情况下，不进行不同技术之间的组合。)
(3)However, such a representation does not take the best out of the single data sources. Accordingly, we propose different strategies to combine both data representations, and therefore we have designed an architecture that we called hybrid. The overall architecture is presented in Figure 2-3. (然而，这种表示并没有充分利用单一数据源。因此，我们提出了不同的策略来组合这两种数据表示，因此我们设计了一种称为混合的体系结构。总体架构如图2-3所示。)

(4)The process starts by encoding a representation of a user $u$ and a item $i$ . Such a representation is based on the embeddings $uc→\overrightarrow{u_c}$
, $ut→\overrightarrow{u_t}$ $u^{t}$ , $ic→\overrightarrow{i_c}$ $i^{c}$ , $it→\overrightarrow{i_t}$ $i^{t}$ , learnt as described in Sec. 2.1.
- Then, these vectors are passed as input of a dense layer, in order to reduce their dimension and make them homogeneous. (然后，将这些向量作为密集层的输入进行传递，以降低其维数并使其均匀。)
- Next, the embeddings pass through the core of the whole architecture, which is a concatenation layer whose goal is to combine the single sources of information into a hybrid representation. (接下来，嵌入将贯穿整个体系结构的核心，这是一个连接层，其目标是将单个信息源组合成一个混合表示。)
In particular, in this work we introduce two different concatenation strategies (Fig. 2-3), defined as follows: (特别是，在这项工作中，我们介绍了两种不同的串联策略（图2-3），定义如下：)
- Entity-based concatenation: the idea behind this strategy is to first merge user embeddings learnt through different techniques, before concatenating them to item embeddings. The central insight guiding us in the development of this configuration is that merging different representations of the same entity in the first layer of the network could help to better represent the entity itself. Specifically, we think that heterogeneous information obtained by combining different groups of features could help the network to more easily identify the characterizing dimensions of items and users. To this end, (基于实体的连接：这种策略背后的想法是首先合并通过不同技术学习的用户嵌入，然后再将它们连接到项目嵌入。指导我们开发这种配置的核心观点是，在网络的第一层中合并同一实体的不同表示可以帮助更好地表示实体本身。具体来说，我们认为通过组合不同的特征组获得的异构信息可以帮助网络更容易地识别项目和用户的特征维度。)
  - we first use the Concatenation Layer (1) to put together vectors $uc→\overrightarrow{u_c}$ and $ut→\overrightarrow{u_t}$ ;
  - Next, in Concatenation Layer (2) we merge $ic→\overrightarrow{i_c}$ and $it→\overrightarrow{i_t}$
  - Finally, by using the Concatenation Layer (3) we further merge the representations based on different encoding strategies into a single vector, just before the top of our architecture. (最后，通过使用连接层（3），我们进一步将基于不同编码策略的表示合并到一个向量中，就在我们的架构顶部之前。)
- Feature-based concatenation: in this case,
  - we start by concatenating $ic→\overrightarrow{i_c}$ and $uc→\overrightarrow{u_c}$ , that is to say, the concatenation of both graph embeddings through the Concatenation Layer (1).
  - Next, we merge $it→\overrightarrow{i_t}$ and $ut→\overrightarrow{u_t}$ , that is to say, the concatenation of contextual word representations, by using the Concatenation Layer (2).
  - Finally, also in this strategy, we use the Concatenation Layer (3) for merging weights coming from different encoding strategies. In other terms, we decided to merge together the embeddings based on the type of algorithm which generated them. In this case, the model relies on the idea that first concatenating the embeddings of the different entities (i.e., items and users) obtained by the same representation can help the network to strengthen the learning of the relationships between them. Such relationships could lead to a more accurate prediction of the relevance of a recommendation for a specific user. Of course, many other concatenation strategies can be designed based on the combination of different groups of features. For the sake of simplicity, in this work we just evaluate these methods. The analysis of further combinations is left as future work. (最后，在该策略中，我们使用连接层（3）合并来自不同编码策略的权重。换句话说，我们决定根据生成嵌入的算法类型将它们合并在一起。在这种情况下，该模型依赖于这样一种理念，即首先将通过相同表示获得的不同实体（即项目和用户）的嵌入连接起来，可以帮助网络加强对它们之间关系的学习。这种关系可以更准确地预测推荐对特定用户的相关性。当然，还可以根据不同功能组的组合设计许多其他连接策略。为了简单起见，在这项工作中，我们只是评估这些方法。对进一步组合的分析留作将来的工作。)
(5)Obtaining the Recommendations. However, regardless the specific concatenation strategy which is adopted, a simple concatenation of the original features is not enough to learn the underlying relationships among the features, so the hybrid vectors are passed again through a new dense layer just before the final prediction. As a final layer, we used a last dense layer of size 1 having a sigmoid function as activation function. This layer estimates the probability that the item $i$ is relevant for user $u$ . Before making predictions, the architecture needs to be trained by exploiting all the ratings in the form $(u, i)$ available in the dataset. Once the model is learned, it can predict to what extent user $u$ would like unseen items $\in I$ . More details about the training process are provided in the next section. (获取推荐。然而，无论采用何种具体的连接策略，对原始特征进行简单的连接都不足以了解特征之间的潜在关系，因此在最终预测之前，混合向量会再次通过一个新的密集层。作为最后一层，我们使用了最后一层大小为1的致密层，其激活函数为sigmoid函数。该层估计项目 $i$ 与用户 $u$ 相关的概率。在进行预测之前，需要利用数据集中可用的 $(u, i)$ 形式的所有评分对架构进行培训。一旦学习了该模型，它就可以预测用户 $u$ 在多大程度上喜欢的不可见项 $j∈Ij\in I$ .关于培训过程的更多细节将在下一节中提供。)

4 EXPERIMENTAL EVALUATION

In the experimental session we evaluated the effectiveness of our methodology in the task of item recommendation. In particular, experiments were designed to answer to the following research questions: (在实验阶段，我们评估了我们的方法在项目推荐任务中的有效性。具体而言，实验旨在回答以下研究问题：)

how do the different strategies to combine graph embeddings and contextual word representations perform in a recommendation task? (Experiment 1); (在推荐任务中，结合图形嵌入和上下文单词表示的不同策略是如何执行的？（实验1）；)
how does our hybrid representation perform with respect to other state-of-the-art techniques? (Experiment 2) (与其他最先进的技术相比，我们的混合表现如何？（实验2）)

4.1 Experimental Design

4.1.1 Datasets

Experiments were carried out in a movie recommendation and in a book recommendation scenario. In the former case, Movielens 1M (ML1M)2 was exploited as dataset, while in the latter we used DBbook dataset3. Table 1 depicts some statistics about the datasets. (实验在电影推荐和书籍推荐场景中进行。在前一种情况下，Movielens 1M（ML1M）2被用作数据集，而在后一种情况下，我们使用DBbook数据集3。表1描述了有关数据集的一些统计数据)

Generally speaking, ML1M is larger than DBbook, but it is less sparse so it is more suitable for collaborative filtering-based algorithms. (一般来说，ML1M比DBbook大，但稀疏性较小，因此更适合基于协同过滤的算法)
On the other side, DBbook is sparser and unbalanced towards negative preferences (only 45.85% of positive ratings), and this makes the recommendation task very challenging. (另一方面，DBbook比较稀疏且在负面偏好方面不平衡（只有45.85%的正面评级），这使得推荐任务非常具有挑战性。)

4.1.2 Protocol

For both the datasets, we used a 80%-20% training-test split. Data were split in order to maintain the ratio between positive and negative ratings. (对于这两个数据集，我们使用了80%-20%的训练测试分割。为了保持正面和负面评级之间的比率，数据被分割。)

As for MovieLens-1M we considered as positive only the ratings equal to 4 and 5 out of 5. (至于MovieLens-1M，我们认为只有5分之4和5的评分为正。)
As for DBbook, ratings were provided in a binary format (positive/negative). (至于DBbook，评分是以二进制格式（正/负）提供的。)
The predictive accuracy of the algorithms was evaluated on top-5 and top-10 recommendation list, calculated by following the TestRatings strategy [3]. (算法的预测准确性在top-5和top-10推荐列表中进行评估，通过遵循TestRatings策略计算[3]。)
For the sake of brevity, we just present the result of the top-5 list since the analysis on top-10 leads to similar findings. In order to obtain the top- $k$ items, we first predict through our deep architecture interest scores for all the items $i$ in the Test Set of user $u$ , then we rank the items based on the output of the neural network. (为了简洁起见，我们只给出前五名的结果，因为对前十名的分析得出了类似的结果。为了获得前k个项目，我们首先通过我们的深层架构（deep architecture）预测用户 $u$ 测试集中所有项目 $i$ 的兴趣分数，然后根据神经网络的输出对项目进行排序。)

4.1.3 Configurations

We evaluate six different configurations of the framework on varying of the encoding techniques as well as the concatenation strategy. In particular, we evaluated the representation learnt based on TransE, TransH, BERT and USE by exploiting both the entity-based and feature-based concatenation strategies. Moreover, in order to assess the effectiveness of ourhybrid representation, we also evaluate an alternative implementation of our deep architecture where the embeddings based on just one technique (e.g., just graph embeddings or just contextual word representations) are combined as showed in Fig. 1. In the following, we refer to this configuration as basic configuration. (我们根据不同的编码技术和级联策略评估了框架的六种不同配置。特别是，我们利用基于实体和基于特征的连接策略，评估了基于TransE、TransH、BERT和USE学习的表征。此外，为了评估我们的混合表示的有效性，我们还评估了我们的深层架构的替代实现，其中仅基于一种技术（例如，仅图形嵌入或仅上下文单词表示）的嵌入被组合，如图1所示。在下文中，我们将此配置称为基本配置。)

4.1.4 Overview of the Parameters

Our models were trained for 25 epochs,

by setting batch sizes to 512 for ML1M and to 1536 for DBbook, respectively.
The parameter α is set to 0.9 and learning rate is set to 0.001.
As cost function we choose the binary cross entropy.
As optimizer, we used ADAM for ML1M and RMSprop for for DBbook.
In the basic configuration (see Fig. 1) the dimension of dense layers is set to 64 neurons for the layers labeled as dense_layer_1 and dense_layer_2, and it is set to 32 for dense_layer_3.
Finally, the dimension of the last layer is set to 1 neuron in order to make pre-dictions.
A similar setting has been set for hybrid configurations (see Fig. 2-3). In this case dense layers from 1 to 6 are set to 64 neurons, while the others follow the same size introduced in the previous configuration.
Finally, the dimension of the embeddings was set to 768 for BERT and 512 for USE.
Due to space reasons we do not report more results, but our experiments showed that different dimensions of the embeddings did not significantly affect the overall accuracy. (由于空间原因，我们没有报告更多的结果，但我们的实验表明，嵌入件的不同尺寸对整体精度没有显著影响。)

4.1.5 Source Code and Baselines

The source code of our hybrid recommendation algorithm is available on GitHub4. Moreover, in Experiment 2 we compared our methodology to several baselines based on Deep Neural Networks implemented into the NeuRec library5, such as: (我们的混合推荐算法的源代码可在GitHub4上获得。此外，在实验2中，我们将我们的方法与基于在NeuRec library5中实现的深度神经网络的几个基线进行了比较，例如：)

(MLP) Multi-Layer Perceptron [21]: the authors present a multimodal deep learning model for developing an item-based collaborative filtering system. In particular, they propose a deep network that combines the descriptive features of items and users by concatenating them. A standard multilayer perceptron is used to learn the interaction between latent features. In order to run this model we set the number of learning epocs equal to 100 and the batch size to 256; (（MLP）多层感知器[21]：作者提出了一种用于开发基于项目的协同过滤系统的多模式深度学习模型。特别是，他们提出了一个深度网络，通过将项目和用户的描述性特征连接在一起。一个标准的多层感知器用于学习潜在特征之间的相互作用。为了运行该模型，我们将学习EPOC的数量设置为100，批量大小设置为256；)
(NeuMF) Neural Matrix Factorization [21]: is an item- based collaborative filtering deep learning model able to combine matrix factorization and multi-layer perceptron. In particular, the authors learn separate embeddings for matrix factorization and multi-layer perceptron and combine them by concatenating their last hidden layer. We configure the model to generate embedding of 16 dimensions and run the model for 100 epochs of learning and 256 elements as a batch; (（NeuMF）神经矩阵分解[21]：是一种基于项目的协同过滤深度学习模型，能够将矩阵分解和多层感知器相结合。特别是，作者学习了矩阵分解和多层感知器的独立嵌入，并通过连接最后一个隐藏层将它们结合起来。我们将模型配置为生成16维的嵌入，并批量运行100个学习阶段和256个元素的模型；)
(NAIS) Neural Attentive Item Similarity [20]: it is an item-based collaborative filtering system. NAIS is an attention network, which is capable of distinguishing which historical items in a user profile are more important for a prediction. It uses a strong representation power by adding to state-of the art models only a few additional parameters brought by the attention network. The model has been run for 8 training epochs with an embedding size of 16 and a batch size of 256 elements; (（NAIS）神经注意项目相似性[20]：这是一个基于项目的协同过滤系统。NAIS是一个注意力网络，能够区分用户档案中哪些历史项目对预测更重要。它使用了强大的表现力，只在最先进的模型中添加了注意力网络带来的几个额外参数。该模型已经运行了8个训练阶段，嵌入大小为16，批量大小为256个元素；)
(DeepICF)Deep Item-based Collaborative Filtering [51]: is a deep learning model for item-based collaborative filtering that can model higher-order relationships among items. They use multiple nonlinear layers above the pairwise interaction modeling to learn higher-order item relations. We run the model by setting batch size to 16 dimensions, 16 training epochs and a batch size of 256 items; (（DeepICF）基于项目的深度协同过滤[51]：是基于项目的协同过滤的深度学习模型，可以模拟项目之间的高阶关系。他们在成对交互模型之上使用多个非线性层来学习高阶项关系。我们通过将批量大小设置为16个维度、16个训练时段和256个项目的批量大小来运行该模型；)
(FISM) Factored Item Similarity Models [25]: is an item-based method for generating top-N recommendations that learns the item-item similarity matrix as the product of two low dimensional latent factor matrices. The optimal parameter configuration of the model is obtained by setting an embedding size of 16, 16 learning epochs and 256 as batch size; (（FISM）因子项目相似性模型[25]：是一种基于项目的方法，用于生成top-N推荐，该方法将项目相似性矩阵学习为两个低维潜在因素矩阵的乘积。通过设置16个嵌入大小、16个学习阶段和256个批量大小，获得了模型的最佳参数配置；)
(DAE) Denoising Auto-Encoder [50]: is a method for item-based top-N recommendation that utilizes the idea of Denoising Auto-Encoders. The model learns correlations between the user’s item preference by training on a corrupted version of the known preference set by using a one-hidden-layer neural network. The model has been trained for 1000 training epochs on batch size of 256 elements; (（DAE）去噪自动编码器[50]：是一种基于项目的top-N推荐方法，利用去噪自动编码器的思想。该模型通过使用一个隐层神经网络对已知偏好集的损坏版本进行训练，从而学习用户项目偏好之间的相关性。该模型已在256个元素的批量上进行了1000次训练；)
(CFGAN) Collaborative Filtering Framework based on Generative Adversarial Networks [10]: is a GAN-based collaborative filtering (CF) framework to provide higher accuracy in recommendation. CFGAN aims at generating a purchase vector with real-valued elements, by distinguishing between the vectors generated by the model and the real ones from the ground truth. It has been configured in order to train for 100 epochs with a batch of 64 items; (（CFGAN）基于生成性对抗网络的协同过滤框架[10]：是一个==基于GAN的协同过滤（CF）==框架，可提供更高的推荐准确性。CFGAN的目标是通过区分模型生成的向量和来自地面真相的真实向量，生成具有实值元素的购买向量。它已被配置为一批64项的100个时代的训练；)
(WRMF) Collaborative Filtering for Implicit Feedback Datasets [22]: is a collaborative filtering approach for implicit feedback dataset. In particular the authors estimate the relevance of an item for each user by using a confidence level. A latent factor that uses these scores has been consequently used. The model has been configured for working with an embedding size of 8 and 300 epochs of training. (（WRMF）隐式反馈数据集的协同过滤[22]：是一种针对隐式反馈数据集的协同过滤方法。特别是，作者通过使用置信水平来估计项目对每个用户的相关性。因此使用了一个使用这些分数的潜在因素。该模型已配置为嵌入大小为8和300个训练时段。)

Moreover, in order to compare our approach to other methods exploitingside information, we also considered the following baselines
available in the Elliot framework [1]6: (此外，为了将我们的方法与利用外部信息的其他方法进行比较，我们还考虑了Elliot框架[1]6中的以下基线：)

kNN: is the well known nearest neighbor classifier. The kNN classifier search for the k closest elements from a given element to be classified. These elements are commonly known as nearest neighbors. The item label is consequently assigned according to the class of its neighbors [38]. In our experiment we investigate the item-knn, the user-knn and their hybrid variants: att-item-knn, att-user-knn. In particular attribute-based KNN approaches can access other types of side information in the form of item attributes [15]. For our purposes, we enriched each item with ontological, categorical and factual information coming from extracted from DBpedia by selecting, from the knowledge base, the item properties. We configured the model for using a cosine similarity measure and a number of neighbor equal to 100. (kNN：是众所周知的最近邻分类器。kNN分类器从要分类的给定元素中搜索k个最近的元素。这些元素通常被称为最近邻。因此，项目标签是根据其邻居的类别分配的[38]。在我们的实验中，我们研究了项目knn、用户knn及其混合变体：att项目knn、att用户knn。尤其是基于属性的KNN方法可以访问项目属性形式的其他类型的边信息[15]。出于我们的目的，我们通过从知识库中选择条目属性，从DBpedia中提取本体论、分类和事实信息来丰富每个条目。我们将模型配置为使用余弦相似性度量和等于100的邻居数。)
VSM: is the classic vector space model. Every document is represented as a vector of term weights, where each weight indicates the degree of association between the document and the term [38]. In our run the features used for training the model are the same side information already used for the kNN classifiers. Also in this case we used the cosine as similarity metric. (VSM：是经典的向量空间模型。每个文档都表示为术语权重向量，其中每个权重表示文档与术语之间的关联程度[38]。在我们的实验中，用于训练模型的特征与用于kNN分类器的边信息相同。在本例中，我们使用余弦作为相似性度量。)
kaHFM: is a Knowledge-aware Hybrid Factorization Machines RS. The model extends classic factorization machines by using the semantic information encoded in a knowledge graph. More details can be found in [2]. In our run the model use the same side information already used for the kNN classifiers. (kaHFM：是一种知识感知的混合分解机RS。该模型利用知识图中编码的语义信息扩展了经典的分解机。更多细节见[2]。在我们的运行中，模型使用了与kNN分类器相同的边信息。)

For sake of simplicity we reported here only the best parameters of the baselines used for running them in the reported comparison.
Indeed, for each baseline model we run a grid-search optimization by using F1-measure as score function. (为了简单起见，我们在这里只报告了在报告的比较中用于运行它们的基线的最佳参数。实际上，对于每个基线模型，我们都使用F1度量作为评分函数来运行网格搜索优化。)

4.1.6 Evaluation Metrics

In order to evaluate the effectiveness of the models we used Precision, Recall and F1 score [12]. These metrics allow us to evaluate the model by considering whether or not what is suggested is relevant to the user. Their weakness is that they do not consider the position of the items in the list of suggestions as an element to be evaluated. Accordingly, we decided to extend our metrics by calculating mean reciprocal ranking (MRR) and the normalized discounted cumulative gain (nDGC) [24, 48] scores. Both of them are ranking measures that reward recommendation lists where relevant items are put in the first positions. Given that in our task a binary score is available for each item (like or dislike), we used the binary relevance version of nDCG [48]. (为了评估模型的有效性，我们使用了精确度、召回率和F1分数[12]。这些指标允许我们通过考虑建议内容是否与用户相关来评估模型。他们的弱点在于，他们不把建议列表中的项目作为一个需要评估的元素。因此，我们决定通过计算平均互惠排名（MRR）和标准化贴现累积收益（nDGC）[24,48]分数来扩展我们的指标。这两种方法都是一种排名方法，用于奖励将相关项目放在第一位的推荐列表。考虑到在我们的任务中，每个项目（喜欢或不喜欢）都有一个二元分数，我们使用了nDCG的二元相关性版本[48]。)

4.2 Discussion of the Results

(1)In Experiment 1 we evaluated the effectiveness of our hybrid representation on varying of the concatenation strategy of $u→\overrightarrow{u}$
and $i→\overrightarrow{i}$ $i$ and of the encoding techniques. As shown in Table 2 and Table 3, the first outcome emerging from the experiment concerns the effectiveness of our hybrid representations, since all the configurations we took into account overcame the basic representation trategies. Just as a recall, basic representation exploits the pre-trained embeddings based on TransE, TransH, BERT, USE without any combination or concatenation. This outcome can be observed for F1 measure (see Table 2), MRR and nDCG (see Table 3). (在实验1中，我们评估了我们的混合表示法对 $u→\overrightarrow{u}$ $u$ 和 $i→\overrightarrow{i}$ $i$ 的连接策略以及编码技术的有效性。如表2和表3所示，实验得出的第一个结果与我们的混合表示的有效性有关，因为我们考虑的所有配置都克服了基本的表示策略。回想一下，basic representation利用了基于TransE、TransH、BERT的预先训练的嵌入，没有任何组合或连接。F1测量（见表2）、MRR和nDCG（见表3）可以观察到这一结果。)
- In particular, by considering the ML1M dataset, our best configuration (TransH + BERT with an entity based concatenation) shows an increment in performance ranging from 1.44% (if compared to TransH alone) to 8.33% (if compared to USE alone, in terms of F1 measure). (特别是，通过考虑ML1M数据集，我们的最佳配置（TransH+BERT，带有基于实体的连接）显示了性能的提高，从1.44%（与TransH单独比较）到8.33%（与单独使用相比，F1度量）。)
- Similar findings can be observed for nDCG and MRR. In the first case, we observe an increase of performances of 1.78% if considering TransH and 10.31% if considering USE. (nDCG和MRR也有类似的发现。在第一种情况下，我们观察到，如果考虑TransH，性能增加1.78%，如果考虑使用TransH，性能增加10.31%。)
- Finally, as for MRR, it emerged a gap in performances of 1.08% comparing with TransH and 6.33% by comparing results with USE. (最后，对于MRR，与TransH相比，其性能差距为1.08%，与使用相比，其性能差距为6.33%。)
(2)As for $DB_{Book}$ , TransE + BERT with feature-based concatenation emerged as the best-performing configuration. In this case, by considering F1-measure, we can observe an increase of 0.8% w.r.t. TransE 4.64% w.r.t. USE. The trend observed for nDCG is equivalent. Moreover, we also noted an increase in performance w.r.t. both TransE (0.65% increase) and USE (4.6% increase). Finally, by analyzing MRR, we observe a slightly smaller difference. In fact, we keep only an increase in performance of 0.77% considering TransE and 3.8% considering USE. (至于 $DB_{Book}$ , 带有基于特征的连接的TransE+BERT成为性能最好的配置。在这种情况下，通过考虑F1措施，我们可以观察到0.8%的水资源利用率和4.64%的水资源利用率增加。观察到的nDCG趋势相当。此外，我们还注意到，性能水资源利用率（增加0.65%）和使用率（增加4.6%）都有所提高。最后，通过分析MRR，我们观察到一个稍小的差异。事实上，考虑到TransE，我们的性能仅增加了0.77%，考虑到使用，我们的性能仅增加了3.8%。)
(3)Overall, it emerges that feature-based concatenation obtained the best results. Positive findings also emerge for entity-based concatenation, which overcame the basic representations. (总的来说，基于特征的连接得到了最好的结果。基于实体的连接也出现了积极的发现，它克服了基本表示法。)
- As regards the techniques, configurations based on BERT obtained the best results on both the datasets. The result obtained does not surprise us, in fact as known in the literature, the use of a bi-directional Transformer in the BERT model is able to better represent relationships between terms and obtain more effective results [28]. (就技术而言，基于BERT的配置在两个数据集上都获得了最佳结果。得到的结果并不让我们惊讶，事实上，正如文献中所知，在BERT模型中使用双向Transformer能够更好地表示术语之间的关系，并获得更有效的结果[28]。)
- Conversely, as for the graph embedding techniques, TransH obtained the best results on MovieLens, while TransE overcame the other configurations on DBbook. This is probably due to the characteristics of the dataset: indeed, MovieLens data are more dense, thus more one-to-many relations are encoded in the dataset. (相反，对于图形嵌入技术，TransH在MovieLens上获得了最好的结果，而TransE在DBbook上胜出了其他配置。这可能是由于数据集的特点：实际上，MovieLens数据更密集，因此数据集中编码了更多的一对多关系。)
- Accordingly, it is likely that the ability of TransH to better learn these relationships leads to a better representation. This outcome is also confirmed on the basic configurations exploiting graph embeddings. (因此，TransH更好地了解这些关系的能力可能会带来更好的表现。这一结果也在利用图嵌入的基本配置上得到了证实。)
(4)Finally, in order to validate the results, we performed a Wilcoxon signed-rank test. The statistical evaluation demonstrated that there is a difference in performances among our best configuration and basic approaches by considering $p < 0.01$ . Based on these results, we can state that the intuition of learning a hybrid representation to encode information about user and items into a deep recommendation framework was supported by the results. A statistically significant gap also emerged by comparing the best performing configuration to most of the alternative settings. (最后，为了验证结果，我们进行了Wilcoxon符号秩检验。统计评估表明，考虑 $p < 0.01$ ，我们的最佳配置和基本方法的性能存在差异。基于这些结果，我们可以说，学习混合表示的直觉，将关于用户和项目的信息编码到深度推荐框架中，得到了结果的支持。通过将性能最佳的配置与大多数替代设置进行比较，也出现了统计上的显著差异。)
- In particular,
  - as for MovieLens, 6 out of 8 comparisons led to a significant gap in terms of F1, NDCG and MRR. (至于Movierens，8个比较中有6个在F1、NDCG和MRR方面存在显著差异。)
  - As for DBbook, we noted a relevant pattern since the best configuration significantly overcame all the other configurations, with the exception of those exploiting feature-based concatenation. (至于DBbook，我们注意到了一个相关的模式，因为除了那些利用基于特征的连接的配置之外，最佳配置明显优于所有其他配置。)
  - In this case, we can state that this concatenation strategy, regardless the specific techniques which are adopted, led to the best results in terms of F1-measure. (在这种情况下，我们可以声明，无论采用何种具体技术，这种串联策略都会在F1度量方面产生最佳结果)
  - As a final observation, it should be pointed out that our methodology relies on pre-trained embeddings, thus we proved that it is possible to avoid heavyweight training of deep neural networks and to equally obtain accurate recommendations. (作为最后的观察，应该指出，我们的方法依赖于预先训练的嵌入，因此我们证明了避免深度神经网络的重量级训练和获得同样准确的建议是可能的。)
(5)Next, in Experiment 2, we compared our best-performing configuration in terms of F1-measure (’Our Best’ in Table 4), nDCG and MRR (’Our Best’ in Table 5) to several competitive baselines. As previously stated, we took into account both deep learning techniques, hybrid approaches and methods exploiting side information. (接下来，在实验2中，我们将F1测量（表4中的“我们最好的”）、nDCG和MRR（表5中的“我们最好的”）方面的最佳性能配置与几个竞争基线进行了比较。如前所述，我们考虑了深度学习技术、混合方法和利用辅助信息的方法。)
- In the first case, we aimed to assess the effectiveness of our deep architecture w.r.t. other deep recommender systems. In the latter, we aimed to evaluate the intuition of combining state-of-the-art embedding encoding both collaborative and content-based information. All these models represent the current state of the art in several recommendation tasks. (在第一种情况下，我们旨在评估我们的深层架构（deep architecture）和其他深层推荐系统的有效性。在后者中，我们旨在评估将最先进的嵌入编码与协作信息和基于内容的信息相结合的直觉。所有这些模型都代表了多个推荐任务的当前技术水平。)
- In this case, it is possible to observe for both the datasets a relevant increase in the F1-measure. (在这种情况下，可以观察到两个数据集的F1测量值都有相应的增加)
  - In particular, configuration based on TransH and BERT overcomes CFGAN, the best deep learning baselines on ML1M dataset, by 5.85%. (特别是，基于TransH和BERT的配置比ML1M数据集上的最佳深度学习基线CFGAN克服了5.85%。)
  - Similarly, TransE and BERT overcomes WRMF by 5.72% for the DBbook dataset. About nDCG and MRR, we observed a comparable result. TransH and BERT overcomes CFGAN by 6.17% for nDCG and by 3.87% for MRR on ML1M dataset. TransE and BERT overcomes WRMF by 5.04% for nDCG and by -3.94% for MRR on DBbook dataset. (同样，对于DBbook数据集，Transa和BERT将WRMF克服了5.72%。关于nDCG和MRR，我们观察到了一个可比较的结果。在ML1M数据集上，TransH和BERT对nDCG和MRR分别克服了6.17%和3.87%的CFGAN。在DBbook数据集上，TransE和BERT对nDCG的WRMF和MRR分别克服了5.04%和-3.94%。)
- Interesting results are also observed by considering baseline based on hybrid techniques. (通过考虑基于混合技术的基线，也观察到了有趣的结果)
  - In particular, the results obtained are in line with those obtained by deep learning baselines and, however, lower than those obtained from our best configurations, both onsidering the F1 measure and the nDCG and MRR. This definitely confirms the effectiveness of our approach and supports the underlying hypotheses of this work. (特别是，所获得的结果与深度学习基线所获得的结果一致，但低于从我们的最佳配置中获得的结果，这两个配置均考虑到F1测量以及nDCG和MRR。这肯定证实了我们方法的有效性，并支持这项工作的基本假设。)
  - Also in this case, we decided to validate the obtained results by performing a Wilcoxon signed-rank test. The results of the test showed that the differences observed between our best methods and baselines are statistically significant (p < 0.01). (同样在这种情况下，我们决定通过执行Wilcoxon符号秩检验来验证获得的结果。测试结果表明，我们的最佳方法和基线之间的差异具有统计学意义（p<0.01）)

5 CONCLUSIONS AND FUTURE WORK

(1)In this paper we presented a hybrid recommendation framework based on the combination of graph embeddings and contextual word representations. (本文提出了一种基于图嵌入和上下文词表示相结合的混合推荐框架。)
(2)In particular, we evaluated the possibility of combining graph-based (i.e., TransH, TransE) and contextual word representation (i.e., BERT, USE) techniques through two concatenation strategies: entity-based and feature-based. (特别是，我们评估了通过两种连接策略将基于图形（即TransH、TransE）和上下文词表示（即BERT、USE）技术相结合的可能性：基于实体和基于特征。)
- In the first one, we followed the intuition that by concatenating more representations of the same entity in the first level of the network, we could help the neural model in better generalizing the intrinsic characteristics of the entity. (在第一个例子中，我们遵循直觉，通过在网络的第一级连接同一实体的更多表示，我们可以帮助神经模型更好地概括实体的内在特征。)
- In the feature-based approach, we instead tried to show that concatenating homogeneous representations of users and items (i.e., created with the same representation technique) could help the network better learn the relationships between these entities. (在基于特征的方法中，我们试图证明，将用户和项目的同质表示（即使用相同的表示技术创建）连接起来，可以帮助网络更好地了解这些实体之间的关系。)
(3)Results showed that hybridization techniques of user and item representations yield excellent results compared to many baselines and compared to the use of a single content representation technique. In particular, the feature-based concatenation approach has shown the most promise. (结果表明，与许多基线和使用单一内容表示技术相比，用户和项目表示的混合技术产生了优异的结果。特别是，基于特征的连接方法显示出了最大的潜力。)
(4)Indeed, the configuration using TransH + BERT proved to be the best for the M1M dataset, while TransE + BERT was the winner for the DBbook dataset. These results confirm the validity of the intuition behind the proposed framework. (事实证明，使用TransH+BERT的配置最适合M1M数据集，而TransE+BERT则是DBbook数据集的赢家。这些结果证实了该框架背后直觉的有效性。)
(5)As future work, we will extend the model by also encoding features coming from knowledge graphs such as DBpedia or Freebase. Moreover, we will extend our deep architecture by introducing attention mechanisms, and we will evaluate more content representation strategies in order to learn an even more precise representation of users and items. (作为未来的工作，我们还将通过编码来自知识图（如DBpedia或Freebase）的特征来扩展该模型。此外，我们将通过引入注意机制来扩展我们的深层架构，我们将评估更多的内容表示策略，以便学习用户和项目的更精确表示。)

ACKNOWLEDGMENTS

REFERENCES

2021_RecSys_Together is Better: Hybrid Recommendations Combining Graph Embeddings and Contextualized相关推荐

论文阅读课4-Long-tail Relation Extraction via Knowledge Graph Embeddings（GCN,关系抽取，2019，远程监督，少样本不平衡，2注意
文章目录 abstract 1.introduction 2.相关工作 2.1 关系提取 2.2 KG embedding 2.3 GCNN 3. 方法 3.1符号 3.2框架 3.2.1 Insta ...
ConvE：Convolutional 2D Knowledge Graph Embeddings
论文:Convolutional 2D Knowledge Graph Embeddings 1 介绍 1.1 提出原因之前提出的模型如disMult,Trans系列模型,成为浅层模型,虽然比较简单 ...
HAKE笔记：Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction
原文:Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction 代码:https://github.com/MIR ...
[论文阅读笔记] Are Meta-Paths Necessary, Revisiting Heterogeneous Graph Embeddings
[论文阅读笔记] Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings 购物返利 www.cpa5.cn 本文结构解 ...
【论文阅读】Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction
<Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction> 论文来源:EMNLP2020 论文链接: ...
论文阅读2 Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction
目录问题创新 1.Introduction 2.相关工作 3.HAKE模型原文:[1911.09419] Learning Hierarchy-Aware Knowledge Graph Emb ...
Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction论文阅读笔记
我的博客链接 0. 前言 1. 作者试图解决什么问题? 作者想在KGE中对语义层级(semantic hierarchies)进行建模. 2. 这篇论文的关键元素是什么? semantic hiera ...
Quaternion Knowledge Graph Embeddings
Quaternion Knowledge Graph Embeddings. NeurIPS 2019. https://github.com/cheungdaven/QuatE 摘要在这篇文章中, ...
cs224w（图机器学习）2021冬季课程学习笔记12 Knowledge Graph Embeddings
诸神缄默不语-个人CSDN博文目录 cs224w(图机器学习)2021冬季课程学习笔记集合文章目录 1. Heterogeneous Graphs and Relational GCN (RGCN) ...

2021_RecSys_Together is Better: Hybrid Recommendations Combining Graph Embeddings and Contextualized