Skip to content

Commit 4b0cdca

Browse files
committed
added hashtag recommendation classes for WWW2017 paper
1 parent 1ffe5a9 commit 4b0cdca

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+6240
-1557
lines changed

README.md

Lines changed: 40 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ Furthermore, it contains algorithms to process datasets (e.g., p-core pruning, l
1212

1313
The software already contains four novel tag-recommender approaches based on cognitive science theory. The first one ([3Layers](http://www.christophtrattner.info/pubs/cikm2013.pdf)) (Seitlinger et al, 2013) uses topic information and is based on the ALCOVE/MINERVA2 theories (Krutschke, 1992; Hintzman, 1984). The second one ([BLL+C](http://delivery.acm.org/10.1145/2580000/2576934/p463-kowald.pdf)) (Kowald et al., 2014b) uses time information is based on the ACT-R theory (Anderson et al., 2004). The third one ([3LT](http://www.christophtrattner.info/pubs/msm8_kowald.pdf)) (Kowald et al., 2015b) is a combination of the former two approaches and integrates the time component on the level of tags and topics. Finally, the fourth one ([BLLac+MPr](http://www.christophtrattner.info/pubs/msm7_kowald.pdf)) extends the BLL+C algorithm with semantic correlations (Kowald et al., 2015a).
1414

15-
Additionally, TagRec also contains algorithms for the personalized recommendation of resources / items in social tagging systems. In this respect TagRec includes a novel algorithm called [CIRTT](http://www.christophtrattner.info/pubs/sp2014.pdf) (Lacic et al., 2014) that integrates tag and time information using the BLL-equation coming from the ACT-R theory (Anderson et al, 2004). Furthermore, it contains another novel item-recommender called [SUSTAIN+CFu](http://arxiv.org/pdf/1501.07716v1.pdf) (Seitlinger et al., 2015) that improves user-based CF via integrating the addentional focus of users via the SUSTAIN model (Love et al., 2004).
15+
Apart from this, TagRec also contains algorithms for the personalized recommendation of resources / items in social tagging systems. In this respect TagRec includes a novel algorithm called [CIRTT](http://www.christophtrattner.info/pubs/sp2014.pdf) (Lacic et al., 2014) that integrates tag and time information using the BLL-equation coming from the ACT-R theory (Anderson et al, 2004). Furthermore, it contains another novel item-recommender called [SUSTAIN+CFu](http://arxiv.org/pdf/1501.07716v1.pdf) (Seitlinger et al., 2015) that improves user-based CF via integrating the addentional focus of users via the SUSTAIN model (Love et al., 2004).
16+
17+
Finally, TagRec was also utilized for the recommendation of hashtags in Twitter (Kowald et al., 2017). Thus, it contains an initial set of algorithms for this task as well. For this, TagRec also contains a connection to the Apache Solr search engine framework.
1618

1719
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
1820
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
@@ -29,9 +31,11 @@ The source-code can be directly checked-out through this repository. It contains
2931
* ml_core for MovieLens
3032
* lastfm_core for LastFM
3133
* wiki_core for Wikipedia (based on bookmarks from Delicious)
34+
* twitter_core/researchers for the Twitter CompSci dataset
35+
* twitter_core/general for the Twitter Random dataset
3236

3337
## How-to-use
34-
The _tagrecommender_ .jar uses three parameters:
38+
The _tagrec_ .jar uses three parameters:
3539

3640
First the algorithm,
3741

@@ -41,12 +45,12 @@ Tag-Recommender:
4145
* bll_c for BLL and BLL+C (based on ACT-R theory) (Kowald et al., 2014b)
4246
* bll_c_ac for BLL and BLL+MPr together with semantic correlations (Trattner et al., 2014)
4347
* lda for Latent Dirichlet Allocation (Krestel et al., 2009)
44-
* cf for (user-based) Collaborative Filtering (Jäschke et al., 2007)
45-
* cfr for (resource-based and mixed) Collaborative Filtering (Jäschke et al., 2007)
48+
* cf for (user-based) Collaborative Filtering (Jäschke et al., 2007)
49+
* cfr for (resource-based and mixed) Collaborative Filtering (Jäschke et al., 2007)
4650
* fr for Adapted PageRank and FolkRank (Hotho et al., 2006)
4751
* girptm for GIRP and GIRPTM (Zhang et al., 2012)
48-
* mp for MostPopular tags (Jäschke et al., 2007)
49-
* mp_u_r for MostPopular tags by user and/or resource (Jäschke et al., 2007)
52+
* mp for MostPopular tags (Jäschke et al., 2007)
53+
* mp_u_r for MostPopular tags by user and/or resource (Jäschke et al., 2007)
5054

5155
Resource-Recommender:
5256
* item_sustain for the improved CF approach based on the SUSTAIN model
@@ -58,7 +62,17 @@ Resource-Recommender:
5862
* item_zheng for the tag- and time-based approach by Zheng et al. (2011)
5963
* item_huang for the tag- and time-based appraoch by Huang et al. (2014)
6064

65+
Hashtag-Recommender (Kowald et al., 2017):
66+
* hashtag_analysis for analyzing the temporal effects on hashtag reuse
67+
* hashtag_socialmp for MostPopular hashtags of the followees (i.e., MPs)
68+
* hashtag socialrecency for recency-ranked hashtags of the followees (i.e., MRs)
69+
* hashtag_socialbll for BLL-ranked hashtags of the followees (i.e., BLLs)
70+
* hashtag_hybrid for BLLi,s
71+
* hashtag_cb_res for BLLi,s,c on the Twitter CompSci dataset and SOLR core
72+
* hashtag_cb_gen for BLLi,s,c on the Twitter Random dataset and SOLR core
73+
6174
Data-Processing:
75+
* stats for printing the dataset statistics
6276
* core for calculating p-cores on a dataset
6377
* split_l1o for splitting a dataset into training and test-sets using a leave-one-out method
6478
* split_8020 for splitting a dataset into training and test-sets using 80/20 split
@@ -80,6 +94,8 @@ Data-Processing:
8094
* ml for MovieLens
8195
* lastfm for LastFM
8296
* wiki for Wikipedia (based on bookmarks from Delicious)
97+
* twitter_res for the Twitter CompSci dataset
98+
* twitter_gen for the Twitter Random dataset
8399

84100
and third the filename (without file extension)
85101

@@ -154,10 +170,10 @@ _Bibtex:_
154170
address = {New York, NY, USA},
155171
}`
156172

157-
158-
159173
## Publications
160-
* C. Trattner, D. Kowald, P. Seitlinger, S. Kopeinik, S. and T. Ley: [Modeling Activation Processes in Human Memory to Predict the Use of Tags in Social Bookmarking Systems](http://www.christophtrattner.info/pubs/bll_journal_final.pdf), Journal of Web Science, 2016.
174+
* D. Kowald, S. Pujari and E. Lex: Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach (preprint will be available soon), In Proc. of WWW, 2017
175+
* D. Kowald and E. Lex: [The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems](https://arxiv.org/pdf/1604.00837v1.pdf), In Proc. of Hypertext, 2016
176+
* C. Trattner, D. Kowald, P. Seitlinger, S. Kopeinik and T. Ley: [Modeling Activation Processes in Human Memory to Predict the Use of Tags in Social Bookmarking Systems](http://www.christophtrattner.info/pubs/bll_journal_final.pdf), Journal of Web Science, 2016.
161177
* D. Kowald and E. Lex: [Evaluating Tag Recommender Algorithms in Real-World Folksonomies: A Comparative Study](http://dl.acm.org/citation.cfm?id=2799664), In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), ACM, New York, NY, USA, 2015.
162178
* S. Larrain, C. Trattner, D. Parra, E. Graells-Garrido and K. Norvag: [Good Times Bad Times: A Study on Recency Effects in Collaborative Filtering for Social Tagging](http://www.christophtrattner.info/pubs/recsys2015b.pdf), In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), ACM, New York, NY, USA, 2015.
163179
* P. Seitlinger, D. Kowald, S. Kopeinik, I. Hasani-Mavriqi, T. Ley, and Elisabeth Lex: [Attention Please! A Hybrid Resource Recommender Mimicking Attention-Interpretation Dynamics](http://arxiv.org/pdf/1501.07716v1.pdf). In Proc. of WWW'2015 Companion. ACM. 2015
@@ -168,22 +184,23 @@ _Bibtex:_
168184
* P. Seitlinger, D. Kowald, C. Trattner, and T. Ley.: [Recommending Tags with a Model of Human Categorization](http://www.christophtrattner.info/pubs/cikm2013.pdf). In Proceedings of The ACM International Conference on Information and Knowledge Management (CIKM 2013), ACM, New York, NY, USA, 2013.
169185

170186
## References
171-
* A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In The semantic web: research and applications, pages 411426. Springer, 2006.
172-
* L. Zhang, J. Tang, and M. Zhang. Integrating temporal usage pattern into personalized tag prediction. In Web Technologies and Applications, pages 354365. Springer, 2012.
173-
* R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Knowledge Discovery in Databases: PKDD 2007, pages 506514. Springer, 2007.
174-
* R. Krestel, P. Fankhauser, and W. Nejdl. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, pages 6168. ACM, 2009.
175-
* J. R. Anderson, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):10361050, 2004.
176-
* J. K. Kruschke et al. Alcove: An exemplar-based connectionist model of category learning. Psychological review, 99(1):2244, 1992.
177-
* D. L Hintzman. Minerva 2: A simulation model of human memory. Behavior Research Methods, Instruments, & Computers 16 (2), 96101, 1984.
187+
* A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In The semantic web: research and applications, pages 411–426. Springer, 2006.
188+
* L. Zhang, J. Tang, and M. Zhang. Integrating temporal usage pattern into personalized tag prediction. In Web Technologies and Applications, pages 354–365. Springer, 2012.
189+
* R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Knowledge Discovery in Databases: PKDD 2007, pages 506–514. Springer, 2007.
190+
* R. Krestel, P. Fankhauser, and W. Nejdl. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, pages 61–68. ACM, 2009.
191+
* J. R. Anderson, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1050, 2004.
192+
* J. K. Kruschke et al. Alcove: An exemplar-based connectionist model of category learning. Psychological review, 99(1):22–44, 1992.
193+
* D. L Hintzman. Minerva 2: A simulation model of human memory. Behavior Research Methods, Instruments, & Computers 16 (2), 96–101, 1984.
178194
* N. Zheng and Q. Li. A recommender system based on tag and time information for social tagging systems. Expert Syst. Appl., 2011.
179195
* C.-L. Huang, P.-H. Yeh, C.-W. Lin, and D.-C. Wu. Utilizing user tag-based interests in recommender systems for social resource sharing websites. Knowledge-Based Systems, 2014.
180196
* B. C. Love, D. L. Medin, and T. M. Gureckis. Sustain: A network model of category learning. Psychological review, 111(2):309, 2004.
181197

182-
## Main contributor
183-
* Dominik Kowald, Know-Center, Graz University of Technology, dkowald@know-center.at (general contact)
198+
## Main contact and contributor
199+
* [Dominik Kowald](http://www.dominikkowald.info/), Know-Center, Graz University of Technology, dkowald [AT] know [MINUS] center [DOT] at (general contact)
184200

185-
## Contacts and contributors (in alphabetically order)
186-
* Simone Kopeinik, Knowledge Technologies Institute, Graz University of Technology, [email protected] (sustain resource recommender algorithm)
187-
* Emanuel Lacic, Knowledge Technologies Institute, Graz University of Technology, [email protected] (huang, zheng and CIRTT resource recommender algorithms)
188-
* Elisabeth Lex, Knowledge Technologies Institute, Graz University of Technology, [email protected] (general contac)
189-
* Christoph Trattner, Know-Center, [email protected] (general contact)
201+
## Contacts and contributors
202+
* Simone Kopeinik, Knowledge Technologies Institute, Graz University of Technology, simone [DOT] [AT] tugraz [DOT] at (sustain resource recommender algorithm)
203+
* Emanuel Lacic, Knowledge Technologies Institute, Graz University of Technology, elacic [AT] know [MINUS] center [DOT] at (huang, zheng and CIRTT resource recommender algorithms)
204+
* Subhash Pujari, Knowledge Technologies Institute, Graz University of Technology, spujari [AT] student [DOT] tugraz [DOT] at (twitter hashtag recommender algorithms)
205+
* Elisabeth Lex, Knowledge Technologies Institute, Graz University of Technology, elisabeth [DOT] lex [AT] tugraz [DOT] at (general contact)
206+
* Christoph Trattner, Know-Center, Graz University of Technology, ctrattner [AT] know [MINUS] center [DOT] at (general contact)

src/common/CooccurenceMatrix.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ public class CooccurenceMatrix {
3737
private List<Integer> tagCounts;
3838

3939
public CooccurenceMatrix(List<Bookmark> bookmarks, List<Integer> tagCounts, boolean normalize) {
40-
System.out.println("Building matrix ...");
40+
//System.out.println("Building matrix ...");
4141
this.coocurenceMatrix = new SparseMatrix();
4242
this.tagCounts = tagCounts;
4343
this.initMatrix(bookmarks);
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
package common;
2+
3+
import java.util.Comparator;
4+
import java.util.Map;
5+
6+
public class DoubleMapComparatorGeneric<T> implements Comparator<T> {
7+
private Map<T, Double> map;
8+
9+
public DoubleMapComparatorGeneric(Map<T, Double> map) {
10+
this.map = map;
11+
}
12+
13+
@Override
14+
public int compare(T key1, T key2) {
15+
Double val1 = this.map.get(key1);
16+
Double val2 = this.map.get(key2);
17+
if (val1 != null && val2 != null) {
18+
return (val1 >= val2 ? - 1 : 1);
19+
}
20+
return 0;
21+
}
22+
}

src/common/MapUtil.java

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
package common;
2+
3+
import java.util.*;
4+
5+
public class MapUtil
6+
{
7+
public static <K, V extends Comparable<? super V>> Map<K, V> sortByValue( Map<K, V> map )
8+
{
9+
List<Map.Entry<K, V>> list =
10+
new LinkedList<Map.Entry<K, V>>( map.entrySet() );
11+
Collections.sort( list, new Comparator<Map.Entry<K, V>>()
12+
{
13+
public int compare( Map.Entry<K, V> o1, Map.Entry<K, V> o2 )
14+
{
15+
return -(o1.getValue()).compareTo( o2.getValue() );
16+
}
17+
} );
18+
19+
Map<K, V> result = new LinkedHashMap<K, V>();
20+
for (Map.Entry<K, V> entry : list)
21+
{
22+
result.put( entry.getKey(), entry.getValue() );
23+
}
24+
return result;
25+
}
26+
27+
public static void normalizeMap(Map<Integer, Double> map) {
28+
double denom = 0.0;
29+
for (Map.Entry<Integer, Double> e : map.entrySet()) {
30+
denom += Math.exp(e.getValue());
31+
}
32+
for (Map.Entry<Integer, Double> e : map.entrySet()) {
33+
e.setValue(Math.exp(e.getValue()) / denom);
34+
}
35+
}
36+
}

src/common/MergeUtil.java

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
package common;
2+
3+
import java.util.LinkedHashMap;
4+
import java.util.Map;
5+
import java.util.TreeMap;
6+
7+
public class MergeUtil {
8+
9+
public static Map<Integer, Double> mergeMapsWithThreshold(Map<Integer, Double> srcMap, Map<Integer, Double> targetMap, int limit) {
10+
Map<Integer, Double> resultMap = new LinkedHashMap<Integer, Double>();
11+
Map<Integer, Double> sortedTargetMap = MapUtil.sortByValue(targetMap);
12+
double threshold = 0.0;
13+
for (Map.Entry<Integer, Double> entry : sortedTargetMap.entrySet()) {
14+
threshold = entry.getValue();
15+
break;
16+
}
17+
System.out.println(threshold);
18+
19+
for (Map.Entry<Integer, Double> srcEntry : srcMap.entrySet()) {
20+
if (srcEntry.getValue() >= threshold) {
21+
resultMap.put(srcEntry.getKey(), srcEntry.getValue());
22+
} else {
23+
break;
24+
}
25+
}
26+
for (Map.Entry<Integer, Double> targetEntry: sortedTargetMap.entrySet()) {
27+
if (resultMap.size() < limit) {
28+
if (!resultMap.containsKey(targetEntry.getKey())) {
29+
resultMap.put(targetEntry.getKey(), targetEntry.getValue());
30+
}
31+
} else {
32+
break;
33+
}
34+
}
35+
36+
return resultMap;
37+
}
38+
}

0 commit comments

Comments
 (0)