
  • 3 algorithm

3 algorithm

3.1 Different from previous attention based MIL algorithms, the method embeds two kinds of attention mechanisms into one network for MIL tasks.
In our method, the attention and self-attention mechanisms are employed because they can be combined naturally.
Different from previous attention based MIL algorithms, ASMI embeds the attention and self-attention mechanisms into one network.

3.2 The attention block can assign weights for each instance in the bag and aggregate each bag into a single vector.
weights vs each instance
bag 读起来有些冗余
The attention block assigns a weight to each instance and aggregates each bag into a single vector.

3.3 The self-attention block is constructed for qualify the influence of each instance to the fused single vector.
-》The self-attention block measures the influence of each instance to its fused single vector.

3.4 With the information of each instance, the distinguishability of vectors generated by self-attention block can be increased.

3.5 A : DATA Model 这里, yi \in {0, …, K-1}
B : where N represents the cardinality. b is partitioned according to respective labels into { b0; b1; …; bL-1 }

3.6 where bi represents the mean vector of bi and d*(bi) represents the mean distance between each vector in bi and d(; ) represents the Euclidean distance.
where bi represents the mean vector of bi,d*(bi) represents the mean distance between each vector in bi and d(; ) represents the Euclidean distance.

3.7 The denominator in Eq. (1) simply quantifies the separation degree between each vector set belonging to different labels.
quantify -》 measure
The denominator in Eq. (1) measures the separation degree between each vector set belonging to different labels.

3.8 discrimination

3.9 Notably, it is uncertain that there is a strictly positive correlation between the distinguishability and the classification accuracy.
But the higher distinguishability usually followed by higher classification accuracy.
类似于SVM 最大化间距,泛化性更好(测试集上效果更好)。
distinguishability -> margin -> generalizability -> more accurate in unknown instances
Many machine learning schemes such as SVM obtain better generalization ability through maximizing the classification margin.
Similarly, here we try to maxmize distinguishability for the same purpose.

3.10 The main task of the attention block is to quantify the contribution of each instance to the bag Bi and obtain a
fused vector hi.
问题1:each instance 有歧义
The main task of the attention block is to measure the contribution of each instance \mathbf{x}_ij to the bag Bi and obtain a fused vector hi.

问题2,fig. 1 和 2对照,因fig. 1简化成框架了, 那么fig 2应交代清楚,让读者能对应上fig2 是fig1的哪个部分
1)fig2就是attention block,那在fig1中看来,attention block的输入是包Bi,但是Fig2是实例。
2)fig1中attention block的输出是hi,但Fig2是系数

根据问题2的分析,具体举例说明,fig2所示的attention block只能量化实例对包的贡献,不能获得hi
而E段,这句话:And the fused vector hi can be obtained based on the learned weights. 应该才是说准确了的

The main task of the attention block is to measure the contribution of each instance xij to the bag Bi.
Then the fused vector hi can be obtained based on the learned weights.

3.11 The dimension of the new feature space depends on the richness of information contained in the original features.
the richness of information这个说得不具体
闵: demension是量化的东西 而richnes是一个形容词,没法支撑demension
For sparser data, the dimension of the new feature space should be set to lower.

3.12 The illustration of calculation process for one attention weight.
注意名词的词性 可数名词要有冠词 数量词等,要么就是负数 同理检查fig. 3下面
A illustration of the calculation process for one attention weight.

3.13 P( ) is firstly employed to transform…
Q() is utilized …
the softmax function is used to …
文中基本都用的这用句式,被动是一种静态关系,可以多采用动态关系描述,用主动态 如:P( ) transforms…

3.14 As the result, the fused vector hi of Bi can be computed as the sum of product between …:
建议通检查一遍 can …,描述算法的过程,主动态或者被动态,直接用就是了。 当需要用婉转、可能等来委婉表达的时候,再用

3.15 文章通篇类似这样的表示:attention based MIL method
attention-based MIL method

3.16 However, it is possible that a large weight is assigned to an instance x_ik, k \in [1; : : : ; ni] which is inconsistent with the bag
-> it is possible … 是可以的 ,但如果直接用情态动词来缩短语句会更好。这里可以 和3.14建议不用情态动词的情况做一个对比
However, a large weight might be assigned to an instance x_ik, k \in [1; : : : ; ni] which is inconsistent with the bag Bi.

3.17 To alleviate this problem, the main task of self-attention block is to use information of each instance to generate an enhanced fused vector bi, which has more representation power than hi.
长句。 利用每个示例的信息产生bi,这个不准确, attention block也是利用实例的信息,所以没有体现出不同
To alleviate this problem, the self-attention block is proposed.
The enhanced fused vector bi is obtained by exploiting the contribution of each instance x_ij to the fused vector h_i.
This introduces two advantages. One… The other is …
Therefore, bi is more representative than hi.
To alleviate this problem, the self-attention block is proposed.这句话没有意义
To address this issue, we design the self-attention block to obtain the enhanced fused vector bi by exploiting the contribution of xij to hi.

3.18 extractions -》 extractors

3.19 As mentioned above, there are two types of mechanisms that are respectively utilized in the proposed algorithm.
As mentioned above, ASMI employs attention and self-attention mechanisms.

3.21 我认为E部分,第一段和第二段的前面部分,不是Analysis,更像是conclusion
从The former computes attention weights…开始,能勉强靠analysis这个方向。
"分析"一般都是时间和空间复杂度分, 以式子为主干,而不是用文字本身分析
Analysis: time/space complexity analysis
如果没有充足的分析支撑,可以用 Discussions

