public code v1
This commit is contained in:
@@ -0,0 +1,96 @@
|
||||
base:
|
||||
data:
|
||||
groups:
|
||||
filepath_or_buffer:
|
||||
- 'datasets/stratigis/groupsWithHighRatings5.txt'
|
||||
- 'datasets/stratigis/groupsWithModerateRatings5untested.txt'
|
||||
testdata:
|
||||
filepath_or_buffer: 'datasets/fake_data.csv'
|
||||
sep: '\t'
|
||||
skiprows: 1
|
||||
names: [ 'userId', 'itemId', 'rating', 'timestamp']
|
||||
test:
|
||||
filepath_or_buffer: 'datasets/stratigis/ratings.csv'
|
||||
sep: ','
|
||||
skiprows: 1
|
||||
names: [ 'userId', 'itemId', 'rating', 'timestamp']
|
||||
ml32m:
|
||||
filepath_or_buffer: 'datasets/ml-32m/ratings.csv'
|
||||
sep: ','
|
||||
skiprows: 1
|
||||
names: [ 'userId', 'itemId', 'rating', 'timestamp']
|
||||
ml100k:
|
||||
filepath_or_buffer: 'datasets/ml-100k/u.data'
|
||||
sep: '\t'
|
||||
skiprows: 0
|
||||
names: [ 'userId', 'itemId', 'rating', 'timestamp']
|
||||
ml1m:
|
||||
filepath_or_buffer: 'datasets/ml-1m/ratings.dat'
|
||||
sep: '::'
|
||||
names: [ 'userId', 'itemId', 'rating', 'timestamp' ]
|
||||
tags:
|
||||
tags_file: 'datasets/stratigis/tags.csv'
|
||||
model:
|
||||
gmf:
|
||||
learning_rate: 0.005
|
||||
weight_decay: 0.0000001
|
||||
latent_dim: 8
|
||||
epochs: 30
|
||||
num_negative: 10
|
||||
batch_size: 1024
|
||||
cuda: False
|
||||
optimizer_name: 'adam'
|
||||
mlp:
|
||||
learning_rate: 0.005
|
||||
weight_decay: 0.0000001
|
||||
latent_dim: 8
|
||||
epochs: 30
|
||||
num_negative: 10
|
||||
batch_size: 1024
|
||||
cuda: False
|
||||
optimizer_name: 'adam'
|
||||
als:
|
||||
learning_rate: 0.1
|
||||
latent_dim: 100
|
||||
epochs: 10
|
||||
reg_term: 0.001
|
||||
|
||||
bpr:
|
||||
learning_rate: 0.01
|
||||
latent_dim: 100
|
||||
epochs: 10
|
||||
reg_term: 0.001
|
||||
emf:
|
||||
learning_rate: 0.01
|
||||
reg_term: 0.001
|
||||
expl_reg_term: 0.0
|
||||
latent_dim: 80
|
||||
epochs: 10
|
||||
positive_threshold: 3
|
||||
knn: 10
|
||||
mf:
|
||||
learning_rate: 0.01
|
||||
reg_term: 0.001
|
||||
expl_reg_term: 0.0
|
||||
latent_dim: 80
|
||||
epochs: 10
|
||||
positive_threshold: 3
|
||||
knn: 10
|
||||
autoencoder:
|
||||
learning_rate: 0.005
|
||||
weight_decay: 0.0000001
|
||||
hidden_layer_features: 8
|
||||
epochs: 30
|
||||
cuda: False
|
||||
optimizer_name: 'adam'
|
||||
positive_threshold: 3
|
||||
knn: 10
|
||||
expl: true
|
||||
explainer:
|
||||
lore4groups:
|
||||
n_similar_for_tree: 100
|
||||
rating_threshold_for_like: 3.0
|
||||
max_tree_depth: 5
|
||||
top_n_labels: 5000
|
||||
min_rating_for_history: 1.0
|
||||
similarity_threshold: 0.1
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,157 @@
|
||||
SUMMARY & USAGE LICENSE
|
||||
=============================================
|
||||
|
||||
MovieLens data sets were collected by the GroupLens Research Project
|
||||
at the University of Minnesota.
|
||||
|
||||
This data set consists of:
|
||||
* 100,000 ratings (1-5) from 943 users on 1682 movies.
|
||||
* Each user has rated at least 20 movies.
|
||||
* Simple demographic info for the users (age, gender, occupation, zip)
|
||||
|
||||
The data was collected through the MovieLens web site
|
||||
(movielens.umn.edu) during the seven-month period from September 19th,
|
||||
1997 through April 22nd, 1998. This data has been cleaned up - users
|
||||
who had less than 20 ratings or did not have complete demographic
|
||||
information were removed from this data set. Detailed descriptions of
|
||||
the data file can be found at the end of this file.
|
||||
|
||||
Neither the University of Minnesota nor any of the researchers
|
||||
involved can guarantee the correctness of the data, its suitability
|
||||
for any particular purpose, or the validity of results based on the
|
||||
use of the data set. The data set may be used for any research
|
||||
purposes under the following conditions:
|
||||
|
||||
* The user may not state or imply any endorsement from the
|
||||
University of Minnesota or the GroupLens Research Group.
|
||||
|
||||
* The user must acknowledge the use of the data set in
|
||||
publications resulting from the use of the data set
|
||||
(see below for citation information).
|
||||
|
||||
* The user may not redistribute the data without separate
|
||||
permission.
|
||||
|
||||
* The user may not use this information for any commercial or
|
||||
revenue-bearing purposes without first obtaining permission
|
||||
from a faculty member of the GroupLens Research Project at the
|
||||
University of Minnesota.
|
||||
|
||||
If you have any further questions or comments, please contact GroupLens
|
||||
<grouplens-info@cs.umn.edu>.
|
||||
|
||||
CITATION
|
||||
==============================================
|
||||
|
||||
To acknowledge use of the dataset in publications, please cite the
|
||||
following paper:
|
||||
|
||||
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
|
||||
History and Context. ACM Transactions on Interactive Intelligent
|
||||
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
|
||||
DOI=http://dx.doi.org/10.1145/2827872
|
||||
|
||||
|
||||
ACKNOWLEDGEMENTS
|
||||
==============================================
|
||||
|
||||
Thanks to Al Borchers for cleaning up this data and writing the
|
||||
accompanying scripts.
|
||||
|
||||
PUBLISHED WORK THAT HAS USED THIS DATASET
|
||||
==============================================
|
||||
|
||||
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic
|
||||
Framework for Performing Collaborative Filtering. Proceedings of the
|
||||
1999 Conference on Research and Development in Information
|
||||
Retrieval. Aug. 1999.
|
||||
|
||||
FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
|
||||
==============================================
|
||||
|
||||
The GroupLens Research Project is a research group in the Department
|
||||
of Computer Science and Engineering at the University of Minnesota.
|
||||
Members of the GroupLens Research Project are involved in many
|
||||
research projects related to the fields of information filtering,
|
||||
collaborative filtering, and recommender systems. The project is lead
|
||||
by professors John Riedl and Joseph Konstan. The project began to
|
||||
explore automated collaborative filtering in 1992, but is most well
|
||||
known for its world wide trial of an automated collaborative filtering
|
||||
system for Usenet news in 1996. The technology developed in the
|
||||
Usenet trial formed the base for the formation of Net Perceptions,
|
||||
Inc., which was founded by members of GroupLens Research. Since then
|
||||
the project has expanded its scope to research overall information
|
||||
filtering solutions, integrating in content-based methods as well as
|
||||
improving current collaborative filtering technology.
|
||||
|
||||
Further information on the GroupLens Research project, including
|
||||
research publications, can be found at the following web site:
|
||||
|
||||
http://www.grouplens.org/
|
||||
|
||||
GroupLens Research currently operates a movie recommender based on
|
||||
collaborative filtering:
|
||||
|
||||
http://www.movielens.org/
|
||||
|
||||
DETAILED DESCRIPTIONS OF DATA FILES
|
||||
==============================================
|
||||
|
||||
Here are brief descriptions of the data.
|
||||
|
||||
ml-data.tar.gz -- Compressed tar file. To rebuild the u data files do this:
|
||||
gunzip ml-data.tar.gz
|
||||
tar xvf ml-data.tar
|
||||
mku.sh
|
||||
|
||||
u.data -- The full u data set, 100000 ratings by 943 users on 1682 items.
|
||||
Each user has rated at least 20 movies. Users and items are
|
||||
numbered consecutively from 1. The data is randomly
|
||||
ordered. This is a tab separated list of
|
||||
user id | item id | rating | timestamp.
|
||||
The time stamps are unix seconds since 1/1/1970 UTC
|
||||
|
||||
u.info -- The number of users, items, and ratings in the u data set.
|
||||
|
||||
u.item -- Information about the items (movies); this is a tab separated
|
||||
list of
|
||||
movie id | movie title | release date | video release date |
|
||||
IMDb URL | unknown | Action | Adventure | Animation |
|
||||
Children's | Comedy | Crime | Documentary | Drama | Fantasy |
|
||||
Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
|
||||
Thriller | War | Western |
|
||||
The last 19 fields are the genres, a 1 indicates the movie
|
||||
is of that genre, a 0 indicates it is not; movies can be in
|
||||
several genres at once.
|
||||
The movie ids are the ones used in the u.data data set.
|
||||
|
||||
u.genre -- A list of the genres.
|
||||
|
||||
u.user -- Demographic information about the users; this is a tab
|
||||
separated list of
|
||||
user id | age | gender | occupation | zip code
|
||||
The user ids are the ones used in the u.data data set.
|
||||
|
||||
u.occupation -- A list of the occupations.
|
||||
|
||||
u1.base -- The data sets u1.base and u1.test through u5.base and u5.test
|
||||
u1.test are 80%/20% splits of the u data into training and test data.
|
||||
u2.base Each of u1, ..., u5 have disjoint test sets; this if for
|
||||
u2.test 5 fold cross validation (where you repeat your experiment
|
||||
u3.base with each training and test set and average the results).
|
||||
u3.test These data sets can be generated from u.data by mku.sh.
|
||||
u4.base
|
||||
u4.test
|
||||
u5.base
|
||||
u5.test
|
||||
|
||||
ua.base -- The data sets ua.base, ua.test, ub.base, and ub.test
|
||||
ua.test split the u data into a training set and a test set with
|
||||
ub.base exactly 10 ratings per user in the test set. The sets
|
||||
ub.test ua.test and ub.test are disjoint. These data sets can
|
||||
be generated from u.data by mku.sh.
|
||||
|
||||
allbut.pl -- The script that generates training and test sets where
|
||||
all but n of a users ratings are in the training data.
|
||||
|
||||
mku.sh -- A shell script to generate all the u data sets from u.data.
|
||||
@@ -0,0 +1,34 @@
|
||||
#!/usr/local/bin/perl
|
||||
|
||||
# get args
|
||||
if (@ARGV < 3) {
|
||||
print STDERR "Usage: $0 base_name start stop max_test [ratings ...]\n";
|
||||
exit 1;
|
||||
}
|
||||
$basename = shift;
|
||||
$start = shift;
|
||||
$stop = shift;
|
||||
$maxtest = shift;
|
||||
|
||||
# open files
|
||||
open( TESTFILE, ">$basename.test" ) or die "Cannot open $basename.test for writing\n";
|
||||
open( BASEFILE, ">$basename.base" ) or die "Cannot open $basename.base for writing\n";
|
||||
|
||||
# init variables
|
||||
$testcnt = 0;
|
||||
|
||||
while (<>) {
|
||||
($user) = split;
|
||||
if (! defined $ratingcnt{$user}) {
|
||||
$ratingcnt{$user} = 0;
|
||||
}
|
||||
++$ratingcnt{$user};
|
||||
if (($testcnt < $maxtest || $maxtest <= 0)
|
||||
&& $ratingcnt{$user} >= $start && $ratingcnt{$user} <= $stop) {
|
||||
++$testcnt;
|
||||
print TESTFILE;
|
||||
}
|
||||
else {
|
||||
print BASEFILE;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
#!/bin/sh
|
||||
|
||||
trap `rm -f tmp.$$; exit 1` 1 2 15
|
||||
|
||||
for i in 1 2 3 4 5
|
||||
do
|
||||
head -`expr $i \* 20000` u.data | tail -20000 > tmp.$$
|
||||
sort -t" " -k 1,1n -k 2,2n tmp.$$ > u$i.test
|
||||
head -`expr \( $i - 1 \) \* 20000` u.data > tmp.$$
|
||||
tail -`expr \( 5 - $i \) \* 20000` u.data >> tmp.$$
|
||||
sort -t" " -k 1,1n -k 2,2n tmp.$$ > u$i.base
|
||||
done
|
||||
|
||||
allbut.pl ua 1 10 100000 u.data
|
||||
sort -t" " -k 1,1n -k 2,2n ua.base > tmp.$$
|
||||
mv tmp.$$ ua.base
|
||||
sort -t" " -k 1,1n -k 2,2n ua.test > tmp.$$
|
||||
mv tmp.$$ ua.test
|
||||
|
||||
allbut.pl ub 11 20 100000 u.data
|
||||
sort -t" " -k 1,1n -k 2,2n ub.base > tmp.$$
|
||||
mv tmp.$$ ub.base
|
||||
sort -t" " -k 1,1n -k 2,2n ub.test > tmp.$$
|
||||
mv tmp.$$ ub.test
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,20 @@
|
||||
unknown|0
|
||||
Action|1
|
||||
Adventure|2
|
||||
Animation|3
|
||||
Children's|4
|
||||
Comedy|5
|
||||
Crime|6
|
||||
Documentary|7
|
||||
Drama|8
|
||||
Fantasy|9
|
||||
Film-Noir|10
|
||||
Horror|11
|
||||
Musical|12
|
||||
Mystery|13
|
||||
Romance|14
|
||||
Sci-Fi|15
|
||||
Thriller|16
|
||||
War|17
|
||||
Western|18
|
||||
|
||||
@@ -0,0 +1,3 @@
|
||||
943 users
|
||||
1682 items
|
||||
100000 ratings
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,21 @@
|
||||
administrator
|
||||
artist
|
||||
doctor
|
||||
educator
|
||||
engineer
|
||||
entertainment
|
||||
executive
|
||||
healthcare
|
||||
homemaker
|
||||
lawyer
|
||||
librarian
|
||||
marketing
|
||||
none
|
||||
other
|
||||
programmer
|
||||
retired
|
||||
salesman
|
||||
scientist
|
||||
student
|
||||
technician
|
||||
writer
|
||||
@@ -0,0 +1,943 @@
|
||||
1|24|M|technician|85711
|
||||
2|53|F|other|94043
|
||||
3|23|M|writer|32067
|
||||
4|24|M|technician|43537
|
||||
5|33|F|other|15213
|
||||
6|42|M|executive|98101
|
||||
7|57|M|administrator|91344
|
||||
8|36|M|administrator|05201
|
||||
9|29|M|student|01002
|
||||
10|53|M|lawyer|90703
|
||||
11|39|F|other|30329
|
||||
12|28|F|other|06405
|
||||
13|47|M|educator|29206
|
||||
14|45|M|scientist|55106
|
||||
15|49|F|educator|97301
|
||||
16|21|M|entertainment|10309
|
||||
17|30|M|programmer|06355
|
||||
18|35|F|other|37212
|
||||
19|40|M|librarian|02138
|
||||
20|42|F|homemaker|95660
|
||||
21|26|M|writer|30068
|
||||
22|25|M|writer|40206
|
||||
23|30|F|artist|48197
|
||||
24|21|F|artist|94533
|
||||
25|39|M|engineer|55107
|
||||
26|49|M|engineer|21044
|
||||
27|40|F|librarian|30030
|
||||
28|32|M|writer|55369
|
||||
29|41|M|programmer|94043
|
||||
30|7|M|student|55436
|
||||
31|24|M|artist|10003
|
||||
32|28|F|student|78741
|
||||
33|23|M|student|27510
|
||||
34|38|F|administrator|42141
|
||||
35|20|F|homemaker|42459
|
||||
36|19|F|student|93117
|
||||
37|23|M|student|55105
|
||||
38|28|F|other|54467
|
||||
39|41|M|entertainment|01040
|
||||
40|38|M|scientist|27514
|
||||
41|33|M|engineer|80525
|
||||
42|30|M|administrator|17870
|
||||
43|29|F|librarian|20854
|
||||
44|26|M|technician|46260
|
||||
45|29|M|programmer|50233
|
||||
46|27|F|marketing|46538
|
||||
47|53|M|marketing|07102
|
||||
48|45|M|administrator|12550
|
||||
49|23|F|student|76111
|
||||
50|21|M|writer|52245
|
||||
51|28|M|educator|16509
|
||||
52|18|F|student|55105
|
||||
53|26|M|programmer|55414
|
||||
54|22|M|executive|66315
|
||||
55|37|M|programmer|01331
|
||||
56|25|M|librarian|46260
|
||||
57|16|M|none|84010
|
||||
58|27|M|programmer|52246
|
||||
59|49|M|educator|08403
|
||||
60|50|M|healthcare|06472
|
||||
61|36|M|engineer|30040
|
||||
62|27|F|administrator|97214
|
||||
63|31|M|marketing|75240
|
||||
64|32|M|educator|43202
|
||||
65|51|F|educator|48118
|
||||
66|23|M|student|80521
|
||||
67|17|M|student|60402
|
||||
68|19|M|student|22904
|
||||
69|24|M|engineer|55337
|
||||
70|27|M|engineer|60067
|
||||
71|39|M|scientist|98034
|
||||
72|48|F|administrator|73034
|
||||
73|24|M|student|41850
|
||||
74|39|M|scientist|T8H1N
|
||||
75|24|M|entertainment|08816
|
||||
76|20|M|student|02215
|
||||
77|30|M|technician|29379
|
||||
78|26|M|administrator|61801
|
||||
79|39|F|administrator|03755
|
||||
80|34|F|administrator|52241
|
||||
81|21|M|student|21218
|
||||
82|50|M|programmer|22902
|
||||
83|40|M|other|44133
|
||||
84|32|M|executive|55369
|
||||
85|51|M|educator|20003
|
||||
86|26|M|administrator|46005
|
||||
87|47|M|administrator|89503
|
||||
88|49|F|librarian|11701
|
||||
89|43|F|administrator|68106
|
||||
90|60|M|educator|78155
|
||||
91|55|M|marketing|01913
|
||||
92|32|M|entertainment|80525
|
||||
93|48|M|executive|23112
|
||||
94|26|M|student|71457
|
||||
95|31|M|administrator|10707
|
||||
96|25|F|artist|75206
|
||||
97|43|M|artist|98006
|
||||
98|49|F|executive|90291
|
||||
99|20|M|student|63129
|
||||
100|36|M|executive|90254
|
||||
101|15|M|student|05146
|
||||
102|38|M|programmer|30220
|
||||
103|26|M|student|55108
|
||||
104|27|M|student|55108
|
||||
105|24|M|engineer|94043
|
||||
106|61|M|retired|55125
|
||||
107|39|M|scientist|60466
|
||||
108|44|M|educator|63130
|
||||
109|29|M|other|55423
|
||||
110|19|M|student|77840
|
||||
111|57|M|engineer|90630
|
||||
112|30|M|salesman|60613
|
||||
113|47|M|executive|95032
|
||||
114|27|M|programmer|75013
|
||||
115|31|M|engineer|17110
|
||||
116|40|M|healthcare|97232
|
||||
117|20|M|student|16125
|
||||
118|21|M|administrator|90210
|
||||
119|32|M|programmer|67401
|
||||
120|47|F|other|06260
|
||||
121|54|M|librarian|99603
|
||||
122|32|F|writer|22206
|
||||
123|48|F|artist|20008
|
||||
124|34|M|student|60615
|
||||
125|30|M|lawyer|22202
|
||||
126|28|F|lawyer|20015
|
||||
127|33|M|none|73439
|
||||
128|24|F|marketing|20009
|
||||
129|36|F|marketing|07039
|
||||
130|20|M|none|60115
|
||||
131|59|F|administrator|15237
|
||||
132|24|M|other|94612
|
||||
133|53|M|engineer|78602
|
||||
134|31|M|programmer|80236
|
||||
135|23|M|student|38401
|
||||
136|51|M|other|97365
|
||||
137|50|M|educator|84408
|
||||
138|46|M|doctor|53211
|
||||
139|20|M|student|08904
|
||||
140|30|F|student|32250
|
||||
141|49|M|programmer|36117
|
||||
142|13|M|other|48118
|
||||
143|42|M|technician|08832
|
||||
144|53|M|programmer|20910
|
||||
145|31|M|entertainment|V3N4P
|
||||
146|45|M|artist|83814
|
||||
147|40|F|librarian|02143
|
||||
148|33|M|engineer|97006
|
||||
149|35|F|marketing|17325
|
||||
150|20|F|artist|02139
|
||||
151|38|F|administrator|48103
|
||||
152|33|F|educator|68767
|
||||
153|25|M|student|60641
|
||||
154|25|M|student|53703
|
||||
155|32|F|other|11217
|
||||
156|25|M|educator|08360
|
||||
157|57|M|engineer|70808
|
||||
158|50|M|educator|27606
|
||||
159|23|F|student|55346
|
||||
160|27|M|programmer|66215
|
||||
161|50|M|lawyer|55104
|
||||
162|25|M|artist|15610
|
||||
163|49|M|administrator|97212
|
||||
164|47|M|healthcare|80123
|
||||
165|20|F|other|53715
|
||||
166|47|M|educator|55113
|
||||
167|37|M|other|L9G2B
|
||||
168|48|M|other|80127
|
||||
169|52|F|other|53705
|
||||
170|53|F|healthcare|30067
|
||||
171|48|F|educator|78750
|
||||
172|55|M|marketing|22207
|
||||
173|56|M|other|22306
|
||||
174|30|F|administrator|52302
|
||||
175|26|F|scientist|21911
|
||||
176|28|M|scientist|07030
|
||||
177|20|M|programmer|19104
|
||||
178|26|M|other|49512
|
||||
179|15|M|entertainment|20755
|
||||
180|22|F|administrator|60202
|
||||
181|26|M|executive|21218
|
||||
182|36|M|programmer|33884
|
||||
183|33|M|scientist|27708
|
||||
184|37|M|librarian|76013
|
||||
185|53|F|librarian|97403
|
||||
186|39|F|executive|00000
|
||||
187|26|M|educator|16801
|
||||
188|42|M|student|29440
|
||||
189|32|M|artist|95014
|
||||
190|30|M|administrator|95938
|
||||
191|33|M|administrator|95161
|
||||
192|42|M|educator|90840
|
||||
193|29|M|student|49931
|
||||
194|38|M|administrator|02154
|
||||
195|42|M|scientist|93555
|
||||
196|49|M|writer|55105
|
||||
197|55|M|technician|75094
|
||||
198|21|F|student|55414
|
||||
199|30|M|writer|17604
|
||||
200|40|M|programmer|93402
|
||||
201|27|M|writer|E2A4H
|
||||
202|41|F|educator|60201
|
||||
203|25|F|student|32301
|
||||
204|52|F|librarian|10960
|
||||
205|47|M|lawyer|06371
|
||||
206|14|F|student|53115
|
||||
207|39|M|marketing|92037
|
||||
208|43|M|engineer|01720
|
||||
209|33|F|educator|85710
|
||||
210|39|M|engineer|03060
|
||||
211|66|M|salesman|32605
|
||||
212|49|F|educator|61401
|
||||
213|33|M|executive|55345
|
||||
214|26|F|librarian|11231
|
||||
215|35|M|programmer|63033
|
||||
216|22|M|engineer|02215
|
||||
217|22|M|other|11727
|
||||
218|37|M|administrator|06513
|
||||
219|32|M|programmer|43212
|
||||
220|30|M|librarian|78205
|
||||
221|19|M|student|20685
|
||||
222|29|M|programmer|27502
|
||||
223|19|F|student|47906
|
||||
224|31|F|educator|43512
|
||||
225|51|F|administrator|58202
|
||||
226|28|M|student|92103
|
||||
227|46|M|executive|60659
|
||||
228|21|F|student|22003
|
||||
229|29|F|librarian|22903
|
||||
230|28|F|student|14476
|
||||
231|48|M|librarian|01080
|
||||
232|45|M|scientist|99709
|
||||
233|38|M|engineer|98682
|
||||
234|60|M|retired|94702
|
||||
235|37|M|educator|22973
|
||||
236|44|F|writer|53214
|
||||
237|49|M|administrator|63146
|
||||
238|42|F|administrator|44124
|
||||
239|39|M|artist|95628
|
||||
240|23|F|educator|20784
|
||||
241|26|F|student|20001
|
||||
242|33|M|educator|31404
|
||||
243|33|M|educator|60201
|
||||
244|28|M|technician|80525
|
||||
245|22|M|student|55109
|
||||
246|19|M|student|28734
|
||||
247|28|M|engineer|20770
|
||||
248|25|M|student|37235
|
||||
249|25|M|student|84103
|
||||
250|29|M|executive|95110
|
||||
251|28|M|doctor|85032
|
||||
252|42|M|engineer|07733
|
||||
253|26|F|librarian|22903
|
||||
254|44|M|educator|42647
|
||||
255|23|M|entertainment|07029
|
||||
256|35|F|none|39042
|
||||
257|17|M|student|77005
|
||||
258|19|F|student|77801
|
||||
259|21|M|student|48823
|
||||
260|40|F|artist|89801
|
||||
261|28|M|administrator|85202
|
||||
262|19|F|student|78264
|
||||
263|41|M|programmer|55346
|
||||
264|36|F|writer|90064
|
||||
265|26|M|executive|84601
|
||||
266|62|F|administrator|78756
|
||||
267|23|M|engineer|83716
|
||||
268|24|M|engineer|19422
|
||||
269|31|F|librarian|43201
|
||||
270|18|F|student|63119
|
||||
271|51|M|engineer|22932
|
||||
272|33|M|scientist|53706
|
||||
273|50|F|other|10016
|
||||
274|20|F|student|55414
|
||||
275|38|M|engineer|92064
|
||||
276|21|M|student|95064
|
||||
277|35|F|administrator|55406
|
||||
278|37|F|librarian|30033
|
||||
279|33|M|programmer|85251
|
||||
280|30|F|librarian|22903
|
||||
281|15|F|student|06059
|
||||
282|22|M|administrator|20057
|
||||
283|28|M|programmer|55305
|
||||
284|40|M|executive|92629
|
||||
285|25|M|programmer|53713
|
||||
286|27|M|student|15217
|
||||
287|21|M|salesman|31211
|
||||
288|34|M|marketing|23226
|
||||
289|11|M|none|94619
|
||||
290|40|M|engineer|93550
|
||||
291|19|M|student|44106
|
||||
292|35|F|programmer|94703
|
||||
293|24|M|writer|60804
|
||||
294|34|M|technician|92110
|
||||
295|31|M|educator|50325
|
||||
296|43|F|administrator|16803
|
||||
297|29|F|educator|98103
|
||||
298|44|M|executive|01581
|
||||
299|29|M|doctor|63108
|
||||
300|26|F|programmer|55106
|
||||
301|24|M|student|55439
|
||||
302|42|M|educator|77904
|
||||
303|19|M|student|14853
|
||||
304|22|F|student|71701
|
||||
305|23|M|programmer|94086
|
||||
306|45|M|other|73132
|
||||
307|25|M|student|55454
|
||||
308|60|M|retired|95076
|
||||
309|40|M|scientist|70802
|
||||
310|37|M|educator|91711
|
||||
311|32|M|technician|73071
|
||||
312|48|M|other|02110
|
||||
313|41|M|marketing|60035
|
||||
314|20|F|student|08043
|
||||
315|31|M|educator|18301
|
||||
316|43|F|other|77009
|
||||
317|22|M|administrator|13210
|
||||
318|65|M|retired|06518
|
||||
319|38|M|programmer|22030
|
||||
320|19|M|student|24060
|
||||
321|49|F|educator|55413
|
||||
322|20|M|student|50613
|
||||
323|21|M|student|19149
|
||||
324|21|F|student|02176
|
||||
325|48|M|technician|02139
|
||||
326|41|M|administrator|15235
|
||||
327|22|M|student|11101
|
||||
328|51|M|administrator|06779
|
||||
329|48|M|educator|01720
|
||||
330|35|F|educator|33884
|
||||
331|33|M|entertainment|91344
|
||||
332|20|M|student|40504
|
||||
333|47|M|other|V0R2M
|
||||
334|32|M|librarian|30002
|
||||
335|45|M|executive|33775
|
||||
336|23|M|salesman|42101
|
||||
337|37|M|scientist|10522
|
||||
338|39|F|librarian|59717
|
||||
339|35|M|lawyer|37901
|
||||
340|46|M|engineer|80123
|
||||
341|17|F|student|44405
|
||||
342|25|F|other|98006
|
||||
343|43|M|engineer|30093
|
||||
344|30|F|librarian|94117
|
||||
345|28|F|librarian|94143
|
||||
346|34|M|other|76059
|
||||
347|18|M|student|90210
|
||||
348|24|F|student|45660
|
||||
349|68|M|retired|61455
|
||||
350|32|M|student|97301
|
||||
351|61|M|educator|49938
|
||||
352|37|F|programmer|55105
|
||||
353|25|M|scientist|28480
|
||||
354|29|F|librarian|48197
|
||||
355|25|M|student|60135
|
||||
356|32|F|homemaker|92688
|
||||
357|26|M|executive|98133
|
||||
358|40|M|educator|10022
|
||||
359|22|M|student|61801
|
||||
360|51|M|other|98027
|
||||
361|22|M|student|44074
|
||||
362|35|F|homemaker|85233
|
||||
363|20|M|student|87501
|
||||
364|63|M|engineer|01810
|
||||
365|29|M|lawyer|20009
|
||||
366|20|F|student|50670
|
||||
367|17|M|student|37411
|
||||
368|18|M|student|92113
|
||||
369|24|M|student|91335
|
||||
370|52|M|writer|08534
|
||||
371|36|M|engineer|99206
|
||||
372|25|F|student|66046
|
||||
373|24|F|other|55116
|
||||
374|36|M|executive|78746
|
||||
375|17|M|entertainment|37777
|
||||
376|28|F|other|10010
|
||||
377|22|M|student|18015
|
||||
378|35|M|student|02859
|
||||
379|44|M|programmer|98117
|
||||
380|32|M|engineer|55117
|
||||
381|33|M|artist|94608
|
||||
382|45|M|engineer|01824
|
||||
383|42|M|administrator|75204
|
||||
384|52|M|programmer|45218
|
||||
385|36|M|writer|10003
|
||||
386|36|M|salesman|43221
|
||||
387|33|M|entertainment|37412
|
||||
388|31|M|other|36106
|
||||
389|44|F|writer|83702
|
||||
390|42|F|writer|85016
|
||||
391|23|M|student|84604
|
||||
392|52|M|writer|59801
|
||||
393|19|M|student|83686
|
||||
394|25|M|administrator|96819
|
||||
395|43|M|other|44092
|
||||
396|57|M|engineer|94551
|
||||
397|17|M|student|27514
|
||||
398|40|M|other|60008
|
||||
399|25|M|other|92374
|
||||
400|33|F|administrator|78213
|
||||
401|46|F|healthcare|84107
|
||||
402|30|M|engineer|95129
|
||||
403|37|M|other|06811
|
||||
404|29|F|programmer|55108
|
||||
405|22|F|healthcare|10019
|
||||
406|52|M|educator|93109
|
||||
407|29|M|engineer|03261
|
||||
408|23|M|student|61755
|
||||
409|48|M|administrator|98225
|
||||
410|30|F|artist|94025
|
||||
411|34|M|educator|44691
|
||||
412|25|M|educator|15222
|
||||
413|55|M|educator|78212
|
||||
414|24|M|programmer|38115
|
||||
415|39|M|educator|85711
|
||||
416|20|F|student|92626
|
||||
417|27|F|other|48103
|
||||
418|55|F|none|21206
|
||||
419|37|M|lawyer|43215
|
||||
420|53|M|educator|02140
|
||||
421|38|F|programmer|55105
|
||||
422|26|M|entertainment|94533
|
||||
423|64|M|other|91606
|
||||
424|36|F|marketing|55422
|
||||
425|19|M|student|58644
|
||||
426|55|M|educator|01602
|
||||
427|51|M|doctor|85258
|
||||
428|28|M|student|55414
|
||||
429|27|M|student|29205
|
||||
430|38|M|scientist|98199
|
||||
431|24|M|marketing|92629
|
||||
432|22|M|entertainment|50311
|
||||
433|27|M|artist|11211
|
||||
434|16|F|student|49705
|
||||
435|24|M|engineer|60007
|
||||
436|30|F|administrator|17345
|
||||
437|27|F|other|20009
|
||||
438|51|F|administrator|43204
|
||||
439|23|F|administrator|20817
|
||||
440|30|M|other|48076
|
||||
441|50|M|technician|55013
|
||||
442|22|M|student|85282
|
||||
443|35|M|salesman|33308
|
||||
444|51|F|lawyer|53202
|
||||
445|21|M|writer|92653
|
||||
446|57|M|educator|60201
|
||||
447|30|M|administrator|55113
|
||||
448|23|M|entertainment|10021
|
||||
449|23|M|librarian|55021
|
||||
450|35|F|educator|11758
|
||||
451|16|M|student|48446
|
||||
452|35|M|administrator|28018
|
||||
453|18|M|student|06333
|
||||
454|57|M|other|97330
|
||||
455|48|M|administrator|83709
|
||||
456|24|M|technician|31820
|
||||
457|33|F|salesman|30011
|
||||
458|47|M|technician|Y1A6B
|
||||
459|22|M|student|29201
|
||||
460|44|F|other|60630
|
||||
461|15|M|student|98102
|
||||
462|19|F|student|02918
|
||||
463|48|F|healthcare|75218
|
||||
464|60|M|writer|94583
|
||||
465|32|M|other|05001
|
||||
466|22|M|student|90804
|
||||
467|29|M|engineer|91201
|
||||
468|28|M|engineer|02341
|
||||
469|60|M|educator|78628
|
||||
470|24|M|programmer|10021
|
||||
471|10|M|student|77459
|
||||
472|24|M|student|87544
|
||||
473|29|M|student|94708
|
||||
474|51|M|executive|93711
|
||||
475|30|M|programmer|75230
|
||||
476|28|M|student|60440
|
||||
477|23|F|student|02125
|
||||
478|29|M|other|10019
|
||||
479|30|M|educator|55409
|
||||
480|57|M|retired|98257
|
||||
481|73|M|retired|37771
|
||||
482|18|F|student|40256
|
||||
483|29|M|scientist|43212
|
||||
484|27|M|student|21208
|
||||
485|44|F|educator|95821
|
||||
486|39|M|educator|93101
|
||||
487|22|M|engineer|92121
|
||||
488|48|M|technician|21012
|
||||
489|55|M|other|45218
|
||||
490|29|F|artist|V5A2B
|
||||
491|43|F|writer|53711
|
||||
492|57|M|educator|94618
|
||||
493|22|M|engineer|60090
|
||||
494|38|F|administrator|49428
|
||||
495|29|M|engineer|03052
|
||||
496|21|F|student|55414
|
||||
497|20|M|student|50112
|
||||
498|26|M|writer|55408
|
||||
499|42|M|programmer|75006
|
||||
500|28|M|administrator|94305
|
||||
501|22|M|student|10025
|
||||
502|22|M|student|23092
|
||||
503|50|F|writer|27514
|
||||
504|40|F|writer|92115
|
||||
505|27|F|other|20657
|
||||
506|46|M|programmer|03869
|
||||
507|18|F|writer|28450
|
||||
508|27|M|marketing|19382
|
||||
509|23|M|administrator|10011
|
||||
510|34|M|other|98038
|
||||
511|22|M|student|21250
|
||||
512|29|M|other|20090
|
||||
513|43|M|administrator|26241
|
||||
514|27|M|programmer|20707
|
||||
515|53|M|marketing|49508
|
||||
516|53|F|librarian|10021
|
||||
517|24|M|student|55454
|
||||
518|49|F|writer|99709
|
||||
519|22|M|other|55320
|
||||
520|62|M|healthcare|12603
|
||||
521|19|M|student|02146
|
||||
522|36|M|engineer|55443
|
||||
523|50|F|administrator|04102
|
||||
524|56|M|educator|02159
|
||||
525|27|F|administrator|19711
|
||||
526|30|M|marketing|97124
|
||||
527|33|M|librarian|12180
|
||||
528|18|M|student|55104
|
||||
529|47|F|administrator|44224
|
||||
530|29|M|engineer|94040
|
||||
531|30|F|salesman|97408
|
||||
532|20|M|student|92705
|
||||
533|43|M|librarian|02324
|
||||
534|20|M|student|05464
|
||||
535|45|F|educator|80302
|
||||
536|38|M|engineer|30078
|
||||
537|36|M|engineer|22902
|
||||
538|31|M|scientist|21010
|
||||
539|53|F|administrator|80303
|
||||
540|28|M|engineer|91201
|
||||
541|19|F|student|84302
|
||||
542|21|M|student|60515
|
||||
543|33|M|scientist|95123
|
||||
544|44|F|other|29464
|
||||
545|27|M|technician|08052
|
||||
546|36|M|executive|22911
|
||||
547|50|M|educator|14534
|
||||
548|51|M|writer|95468
|
||||
549|42|M|scientist|45680
|
||||
550|16|F|student|95453
|
||||
551|25|M|programmer|55414
|
||||
552|45|M|other|68147
|
||||
553|58|M|educator|62901
|
||||
554|32|M|scientist|62901
|
||||
555|29|F|educator|23227
|
||||
556|35|F|educator|30606
|
||||
557|30|F|writer|11217
|
||||
558|56|F|writer|63132
|
||||
559|69|M|executive|10022
|
||||
560|32|M|student|10003
|
||||
561|23|M|engineer|60005
|
||||
562|54|F|administrator|20879
|
||||
563|39|F|librarian|32707
|
||||
564|65|M|retired|94591
|
||||
565|40|M|student|55422
|
||||
566|20|M|student|14627
|
||||
567|24|M|entertainment|10003
|
||||
568|39|M|educator|01915
|
||||
569|34|M|educator|91903
|
||||
570|26|M|educator|14627
|
||||
571|34|M|artist|01945
|
||||
572|51|M|educator|20003
|
||||
573|68|M|retired|48911
|
||||
574|56|M|educator|53188
|
||||
575|33|M|marketing|46032
|
||||
576|48|M|executive|98281
|
||||
577|36|F|student|77845
|
||||
578|31|M|administrator|M7A1A
|
||||
579|32|M|educator|48103
|
||||
580|16|M|student|17961
|
||||
581|37|M|other|94131
|
||||
582|17|M|student|93003
|
||||
583|44|M|engineer|29631
|
||||
584|25|M|student|27511
|
||||
585|69|M|librarian|98501
|
||||
586|20|M|student|79508
|
||||
587|26|M|other|14216
|
||||
588|18|F|student|93063
|
||||
589|21|M|lawyer|90034
|
||||
590|50|M|educator|82435
|
||||
591|57|F|librarian|92093
|
||||
592|18|M|student|97520
|
||||
593|31|F|educator|68767
|
||||
594|46|M|educator|M4J2K
|
||||
595|25|M|programmer|31909
|
||||
596|20|M|artist|77073
|
||||
597|23|M|other|84116
|
||||
598|40|F|marketing|43085
|
||||
599|22|F|student|R3T5K
|
||||
600|34|M|programmer|02320
|
||||
601|19|F|artist|99687
|
||||
602|47|F|other|34656
|
||||
603|21|M|programmer|47905
|
||||
604|39|M|educator|11787
|
||||
605|33|M|engineer|33716
|
||||
606|28|M|programmer|63044
|
||||
607|49|F|healthcare|02154
|
||||
608|22|M|other|10003
|
||||
609|13|F|student|55106
|
||||
610|22|M|student|21227
|
||||
611|46|M|librarian|77008
|
||||
612|36|M|educator|79070
|
||||
613|37|F|marketing|29678
|
||||
614|54|M|educator|80227
|
||||
615|38|M|educator|27705
|
||||
616|55|M|scientist|50613
|
||||
617|27|F|writer|11201
|
||||
618|15|F|student|44212
|
||||
619|17|M|student|44134
|
||||
620|18|F|writer|81648
|
||||
621|17|M|student|60402
|
||||
622|25|M|programmer|14850
|
||||
623|50|F|educator|60187
|
||||
624|19|M|student|30067
|
||||
625|27|M|programmer|20723
|
||||
626|23|M|scientist|19807
|
||||
627|24|M|engineer|08034
|
||||
628|13|M|none|94306
|
||||
629|46|F|other|44224
|
||||
630|26|F|healthcare|55408
|
||||
631|18|F|student|38866
|
||||
632|18|M|student|55454
|
||||
633|35|M|programmer|55414
|
||||
634|39|M|engineer|T8H1N
|
||||
635|22|M|other|23237
|
||||
636|47|M|educator|48043
|
||||
637|30|M|other|74101
|
||||
638|45|M|engineer|01940
|
||||
639|42|F|librarian|12065
|
||||
640|20|M|student|61801
|
||||
641|24|M|student|60626
|
||||
642|18|F|student|95521
|
||||
643|39|M|scientist|55122
|
||||
644|51|M|retired|63645
|
||||
645|27|M|programmer|53211
|
||||
646|17|F|student|51250
|
||||
647|40|M|educator|45810
|
||||
648|43|M|engineer|91351
|
||||
649|20|M|student|39762
|
||||
650|42|M|engineer|83814
|
||||
651|65|M|retired|02903
|
||||
652|35|M|other|22911
|
||||
653|31|M|executive|55105
|
||||
654|27|F|student|78739
|
||||
655|50|F|healthcare|60657
|
||||
656|48|M|educator|10314
|
||||
657|26|F|none|78704
|
||||
658|33|M|programmer|92626
|
||||
659|31|M|educator|54248
|
||||
660|26|M|student|77380
|
||||
661|28|M|programmer|98121
|
||||
662|55|M|librarian|19102
|
||||
663|26|M|other|19341
|
||||
664|30|M|engineer|94115
|
||||
665|25|M|administrator|55412
|
||||
666|44|M|administrator|61820
|
||||
667|35|M|librarian|01970
|
||||
668|29|F|writer|10016
|
||||
669|37|M|other|20009
|
||||
670|30|M|technician|21114
|
||||
671|21|M|programmer|91919
|
||||
672|54|F|administrator|90095
|
||||
673|51|M|educator|22906
|
||||
674|13|F|student|55337
|
||||
675|34|M|other|28814
|
||||
676|30|M|programmer|32712
|
||||
677|20|M|other|99835
|
||||
678|50|M|educator|61462
|
||||
679|20|F|student|54302
|
||||
680|33|M|lawyer|90405
|
||||
681|44|F|marketing|97208
|
||||
682|23|M|programmer|55128
|
||||
683|42|M|librarian|23509
|
||||
684|28|M|student|55414
|
||||
685|32|F|librarian|55409
|
||||
686|32|M|educator|26506
|
||||
687|31|F|healthcare|27713
|
||||
688|37|F|administrator|60476
|
||||
689|25|M|other|45439
|
||||
690|35|M|salesman|63304
|
||||
691|34|M|educator|60089
|
||||
692|34|M|engineer|18053
|
||||
693|43|F|healthcare|85210
|
||||
694|60|M|programmer|06365
|
||||
695|26|M|writer|38115
|
||||
696|55|M|other|94920
|
||||
697|25|M|other|77042
|
||||
698|28|F|programmer|06906
|
||||
699|44|M|other|96754
|
||||
700|17|M|student|76309
|
||||
701|51|F|librarian|56321
|
||||
702|37|M|other|89104
|
||||
703|26|M|educator|49512
|
||||
704|51|F|librarian|91105
|
||||
705|21|F|student|54494
|
||||
706|23|M|student|55454
|
||||
707|56|F|librarian|19146
|
||||
708|26|F|homemaker|96349
|
||||
709|21|M|other|N4T1A
|
||||
710|19|M|student|92020
|
||||
711|22|F|student|15203
|
||||
712|22|F|student|54901
|
||||
713|42|F|other|07204
|
||||
714|26|M|engineer|55343
|
||||
715|21|M|technician|91206
|
||||
716|36|F|administrator|44265
|
||||
717|24|M|technician|84105
|
||||
718|42|M|technician|64118
|
||||
719|37|F|other|V0R2H
|
||||
720|49|F|administrator|16506
|
||||
721|24|F|entertainment|11238
|
||||
722|50|F|homemaker|17331
|
||||
723|26|M|executive|94403
|
||||
724|31|M|executive|40243
|
||||
725|21|M|student|91711
|
||||
726|25|F|administrator|80538
|
||||
727|25|M|student|78741
|
||||
728|58|M|executive|94306
|
||||
729|19|M|student|56567
|
||||
730|31|F|scientist|32114
|
||||
731|41|F|educator|70403
|
||||
732|28|F|other|98405
|
||||
733|44|F|other|60630
|
||||
734|25|F|other|63108
|
||||
735|29|F|healthcare|85719
|
||||
736|48|F|writer|94618
|
||||
737|30|M|programmer|98072
|
||||
738|35|M|technician|95403
|
||||
739|35|M|technician|73162
|
||||
740|25|F|educator|22206
|
||||
741|25|M|writer|63108
|
||||
742|35|M|student|29210
|
||||
743|31|M|programmer|92660
|
||||
744|35|M|marketing|47024
|
||||
745|42|M|writer|55113
|
||||
746|25|M|engineer|19047
|
||||
747|19|M|other|93612
|
||||
748|28|M|administrator|94720
|
||||
749|33|M|other|80919
|
||||
750|28|M|administrator|32303
|
||||
751|24|F|other|90034
|
||||
752|60|M|retired|21201
|
||||
753|56|M|salesman|91206
|
||||
754|59|F|librarian|62901
|
||||
755|44|F|educator|97007
|
||||
756|30|F|none|90247
|
||||
757|26|M|student|55104
|
||||
758|27|M|student|53706
|
||||
759|20|F|student|68503
|
||||
760|35|F|other|14211
|
||||
761|17|M|student|97302
|
||||
762|32|M|administrator|95050
|
||||
763|27|M|scientist|02113
|
||||
764|27|F|educator|62903
|
||||
765|31|M|student|33066
|
||||
766|42|M|other|10960
|
||||
767|70|M|engineer|00000
|
||||
768|29|M|administrator|12866
|
||||
769|39|M|executive|06927
|
||||
770|28|M|student|14216
|
||||
771|26|M|student|15232
|
||||
772|50|M|writer|27105
|
||||
773|20|M|student|55414
|
||||
774|30|M|student|80027
|
||||
775|46|M|executive|90036
|
||||
776|30|M|librarian|51157
|
||||
777|63|M|programmer|01810
|
||||
778|34|M|student|01960
|
||||
779|31|M|student|K7L5J
|
||||
780|49|M|programmer|94560
|
||||
781|20|M|student|48825
|
||||
782|21|F|artist|33205
|
||||
783|30|M|marketing|77081
|
||||
784|47|M|administrator|91040
|
||||
785|32|M|engineer|23322
|
||||
786|36|F|engineer|01754
|
||||
787|18|F|student|98620
|
||||
788|51|M|administrator|05779
|
||||
789|29|M|other|55420
|
||||
790|27|M|technician|80913
|
||||
791|31|M|educator|20064
|
||||
792|40|M|programmer|12205
|
||||
793|22|M|student|85281
|
||||
794|32|M|educator|57197
|
||||
795|30|M|programmer|08610
|
||||
796|32|F|writer|33755
|
||||
797|44|F|other|62522
|
||||
798|40|F|writer|64131
|
||||
799|49|F|administrator|19716
|
||||
800|25|M|programmer|55337
|
||||
801|22|M|writer|92154
|
||||
802|35|M|administrator|34105
|
||||
803|70|M|administrator|78212
|
||||
804|39|M|educator|61820
|
||||
805|27|F|other|20009
|
||||
806|27|M|marketing|11217
|
||||
807|41|F|healthcare|93555
|
||||
808|45|M|salesman|90016
|
||||
809|50|F|marketing|30803
|
||||
810|55|F|other|80526
|
||||
811|40|F|educator|73013
|
||||
812|22|M|technician|76234
|
||||
813|14|F|student|02136
|
||||
814|30|M|other|12345
|
||||
815|32|M|other|28806
|
||||
816|34|M|other|20755
|
||||
817|19|M|student|60152
|
||||
818|28|M|librarian|27514
|
||||
819|59|M|administrator|40205
|
||||
820|22|M|student|37725
|
||||
821|37|M|engineer|77845
|
||||
822|29|F|librarian|53144
|
||||
823|27|M|artist|50322
|
||||
824|31|M|other|15017
|
||||
825|44|M|engineer|05452
|
||||
826|28|M|artist|77048
|
||||
827|23|F|engineer|80228
|
||||
828|28|M|librarian|85282
|
||||
829|48|M|writer|80209
|
||||
830|46|M|programmer|53066
|
||||
831|21|M|other|33765
|
||||
832|24|M|technician|77042
|
||||
833|34|M|writer|90019
|
||||
834|26|M|other|64153
|
||||
835|44|F|executive|11577
|
||||
836|44|M|artist|10018
|
||||
837|36|F|artist|55409
|
||||
838|23|M|student|01375
|
||||
839|38|F|entertainment|90814
|
||||
840|39|M|artist|55406
|
||||
841|45|M|doctor|47401
|
||||
842|40|M|writer|93055
|
||||
843|35|M|librarian|44212
|
||||
844|22|M|engineer|95662
|
||||
845|64|M|doctor|97405
|
||||
846|27|M|lawyer|47130
|
||||
847|29|M|student|55417
|
||||
848|46|M|engineer|02146
|
||||
849|15|F|student|25652
|
||||
850|34|M|technician|78390
|
||||
851|18|M|other|29646
|
||||
852|46|M|administrator|94086
|
||||
853|49|M|writer|40515
|
||||
854|29|F|student|55408
|
||||
855|53|M|librarian|04988
|
||||
856|43|F|marketing|97215
|
||||
857|35|F|administrator|V1G4L
|
||||
858|63|M|educator|09645
|
||||
859|18|F|other|06492
|
||||
860|70|F|retired|48322
|
||||
861|38|F|student|14085
|
||||
862|25|M|executive|13820
|
||||
863|17|M|student|60089
|
||||
864|27|M|programmer|63021
|
||||
865|25|M|artist|11231
|
||||
866|45|M|other|60302
|
||||
867|24|M|scientist|92507
|
||||
868|21|M|programmer|55303
|
||||
869|30|M|student|10025
|
||||
870|22|M|student|65203
|
||||
871|31|M|executive|44648
|
||||
872|19|F|student|74078
|
||||
873|48|F|administrator|33763
|
||||
874|36|M|scientist|37076
|
||||
875|24|F|student|35802
|
||||
876|41|M|other|20902
|
||||
877|30|M|other|77504
|
||||
878|50|F|educator|98027
|
||||
879|33|F|administrator|55337
|
||||
880|13|M|student|83702
|
||||
881|39|M|marketing|43017
|
||||
882|35|M|engineer|40503
|
||||
883|49|M|librarian|50266
|
||||
884|44|M|engineer|55337
|
||||
885|30|F|other|95316
|
||||
886|20|M|student|61820
|
||||
887|14|F|student|27249
|
||||
888|41|M|scientist|17036
|
||||
889|24|M|technician|78704
|
||||
890|32|M|student|97301
|
||||
891|51|F|administrator|03062
|
||||
892|36|M|other|45243
|
||||
893|25|M|student|95823
|
||||
894|47|M|educator|74075
|
||||
895|31|F|librarian|32301
|
||||
896|28|M|writer|91505
|
||||
897|30|M|other|33484
|
||||
898|23|M|homemaker|61755
|
||||
899|32|M|other|55116
|
||||
900|60|M|retired|18505
|
||||
901|38|M|executive|L1V3W
|
||||
902|45|F|artist|97203
|
||||
903|28|M|educator|20850
|
||||
904|17|F|student|61073
|
||||
905|27|M|other|30350
|
||||
906|45|M|librarian|70124
|
||||
907|25|F|other|80526
|
||||
908|44|F|librarian|68504
|
||||
909|50|F|educator|53171
|
||||
910|28|M|healthcare|29301
|
||||
911|37|F|writer|53210
|
||||
912|51|M|other|06512
|
||||
913|27|M|student|76201
|
||||
914|44|F|other|08105
|
||||
915|50|M|entertainment|60614
|
||||
916|27|M|engineer|N2L5N
|
||||
917|22|F|student|20006
|
||||
918|40|M|scientist|70116
|
||||
919|25|M|other|14216
|
||||
920|30|F|artist|90008
|
||||
921|20|F|student|98801
|
||||
922|29|F|administrator|21114
|
||||
923|21|M|student|E2E3R
|
||||
924|29|M|other|11753
|
||||
925|18|F|salesman|49036
|
||||
926|49|M|entertainment|01701
|
||||
927|23|M|programmer|55428
|
||||
928|21|M|student|55408
|
||||
929|44|M|scientist|53711
|
||||
930|28|F|scientist|07310
|
||||
931|60|M|educator|33556
|
||||
932|58|M|educator|06437
|
||||
933|28|M|student|48105
|
||||
934|61|M|engineer|22902
|
||||
935|42|M|doctor|66221
|
||||
936|24|M|other|32789
|
||||
937|48|M|educator|98072
|
||||
938|38|F|technician|55038
|
||||
939|26|F|student|33319
|
||||
940|32|M|administrator|02215
|
||||
941|20|M|student|97229
|
||||
942|48|F|librarian|78209
|
||||
943|22|M|student|77841
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,170 @@
|
||||
SUMMARY
|
||||
================================================================================
|
||||
|
||||
These files contain 1,000,209 anonymous ratings of approximately 3,900 movies
|
||||
made by 6,040 MovieLens users who joined MovieLens in 2000.
|
||||
|
||||
USAGE LICENSE
|
||||
================================================================================
|
||||
|
||||
Neither the University of Minnesota nor any of the researchers
|
||||
involved can guarantee the correctness of the data, its suitability
|
||||
for any particular purpose, or the validity of results based on the
|
||||
use of the data set. The data set may be used for any research
|
||||
purposes under the following conditions:
|
||||
|
||||
* The user may not state or imply any endorsement from the
|
||||
University of Minnesota or the GroupLens Research Group.
|
||||
|
||||
* The user must acknowledge the use of the data set in
|
||||
publications resulting from the use of the data set
|
||||
(see below for citation information).
|
||||
|
||||
* The user may not redistribute the data without separate
|
||||
permission.
|
||||
|
||||
* The user may not use this information for any commercial or
|
||||
revenue-bearing purposes without first obtaining permission
|
||||
from a faculty member of the GroupLens Research Project at the
|
||||
University of Minnesota.
|
||||
|
||||
If you have any further questions or comments, please contact GroupLens
|
||||
<grouplens-info@cs.umn.edu>.
|
||||
|
||||
CITATION
|
||||
================================================================================
|
||||
|
||||
To acknowledge use of the dataset in publications, please cite the following
|
||||
paper:
|
||||
|
||||
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History
|
||||
and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4,
|
||||
Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872
|
||||
|
||||
|
||||
ACKNOWLEDGEMENTS
|
||||
================================================================================
|
||||
|
||||
Thanks to Shyong Lam and Jon Herlocker for cleaning up and generating the data
|
||||
set.
|
||||
|
||||
FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
|
||||
================================================================================
|
||||
|
||||
The GroupLens Research Project is a research group in the Department of
|
||||
Computer Science and Engineering at the University of Minnesota. Members of
|
||||
the GroupLens Research Project are involved in many research projects related
|
||||
to the fields of information filtering, collaborative filtering, and
|
||||
recommender systems. The project is lead by professors John Riedl and Joseph
|
||||
Konstan. The project began to explore automated collaborative filtering in
|
||||
1992, but is most well known for its world wide trial of an automated
|
||||
collaborative filtering system for Usenet news in 1996. Since then the project
|
||||
has expanded its scope to research overall information filtering solutions,
|
||||
integrating in content-based methods as well as improving current collaborative
|
||||
filtering technology.
|
||||
|
||||
Further information on the GroupLens Research project, including research
|
||||
publications, can be found at the following web site:
|
||||
|
||||
http://www.grouplens.org/
|
||||
|
||||
GroupLens Research currently operates a movie recommender based on
|
||||
collaborative filtering:
|
||||
|
||||
http://www.movielens.org/
|
||||
|
||||
RATINGS FILE DESCRIPTION
|
||||
================================================================================
|
||||
|
||||
All ratings are contained in the file "ratings.dat" and are in the
|
||||
following format:
|
||||
|
||||
UserID::MovieID::Rating::Timestamp
|
||||
|
||||
- UserIDs range between 1 and 6040
|
||||
- MovieIDs range between 1 and 3952
|
||||
- Ratings are made on a 5-star scale (whole-star ratings only)
|
||||
- Timestamp is represented in seconds since the epoch as returned by time(2)
|
||||
- Each user has at least 20 ratings
|
||||
|
||||
USERS FILE DESCRIPTION
|
||||
================================================================================
|
||||
|
||||
User information is in the file "users.dat" and is in the following
|
||||
format:
|
||||
|
||||
UserID::Gender::Age::Occupation::Zip-code
|
||||
|
||||
All demographic information is provided voluntarily by the users and is
|
||||
not checked for accuracy. Only users who have provided some demographic
|
||||
information are included in this data set.
|
||||
|
||||
- Gender is denoted by a "M" for male and "F" for female
|
||||
- Age is chosen from the following ranges:
|
||||
|
||||
* 1: "Under 18"
|
||||
* 18: "18-24"
|
||||
* 25: "25-34"
|
||||
* 35: "35-44"
|
||||
* 45: "45-49"
|
||||
* 50: "50-55"
|
||||
* 56: "56+"
|
||||
|
||||
- Occupation is chosen from the following choices:
|
||||
|
||||
* 0: "other" or not specified
|
||||
* 1: "academic/educator"
|
||||
* 2: "artist"
|
||||
* 3: "clerical/admin"
|
||||
* 4: "college/grad student"
|
||||
* 5: "customer service"
|
||||
* 6: "doctor/health care"
|
||||
* 7: "executive/managerial"
|
||||
* 8: "farmer"
|
||||
* 9: "homemaker"
|
||||
* 10: "K-12 student"
|
||||
* 11: "lawyer"
|
||||
* 12: "programmer"
|
||||
* 13: "retired"
|
||||
* 14: "sales/marketing"
|
||||
* 15: "scientist"
|
||||
* 16: "self-employed"
|
||||
* 17: "technician/engineer"
|
||||
* 18: "tradesman/craftsman"
|
||||
* 19: "unemployed"
|
||||
* 20: "writer"
|
||||
|
||||
MOVIES FILE DESCRIPTION
|
||||
================================================================================
|
||||
|
||||
Movie information is in the file "movies.dat" and is in the following
|
||||
format:
|
||||
|
||||
MovieID::Title::Genres
|
||||
|
||||
- Titles are identical to titles provided by the IMDB (including
|
||||
year of release)
|
||||
- Genres are pipe-separated and are selected from the following genres:
|
||||
|
||||
* Action
|
||||
* Adventure
|
||||
* Animation
|
||||
* Children's
|
||||
* Comedy
|
||||
* Crime
|
||||
* Documentary
|
||||
* Drama
|
||||
* Fantasy
|
||||
* Film-Noir
|
||||
* Horror
|
||||
* Musical
|
||||
* Mystery
|
||||
* Romance
|
||||
* Sci-Fi
|
||||
* Thriller
|
||||
* War
|
||||
* Western
|
||||
|
||||
- Some MovieIDs do not correspond to a movie due to accidental duplicate
|
||||
entries and/or test entries
|
||||
- Movies are mostly entered by hand, so errors and inconsistencies may exist
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,17 @@
|
||||
522_385_234_452_594
|
||||
522_385_234_246_428
|
||||
452_246_220_586_82
|
||||
452_246_220_586_198
|
||||
452_246_220_586_50
|
||||
220_586_73_263_372
|
||||
220_586_73_263_365
|
||||
220_586_73_263_6
|
||||
73_263_563_119_66
|
||||
73_263_563_4_312
|
||||
73_263_563_4_354
|
||||
14_156_45_580_560
|
||||
14_156_45_560_318
|
||||
14_156_45_560_606
|
||||
14_156_45_89_28
|
||||
14_156_517_462_448
|
||||
14_156_517_89_28
|
||||
@@ -0,0 +1,23 @@
|
||||
86_384_143_297_401
|
||||
86_384_143_297_528
|
||||
86_384_143_297_579
|
||||
86_384_143_297_359
|
||||
86_384_143_297_566
|
||||
86_384_143_297_223
|
||||
401_528_579_359_241
|
||||
401_528_579_359_337
|
||||
401_528_579_359_445
|
||||
401_528_579_359_211
|
||||
401_528_579_359_422
|
||||
566_223_225_108_241
|
||||
566_223_225_108_370
|
||||
566_223_225_108_361
|
||||
566_223_225_241_191
|
||||
566_223_225_178_29
|
||||
225_523_108_241_543
|
||||
523_108_241_543_29
|
||||
523_108_241_543_500
|
||||
523_543_178_337_500
|
||||
523_543_178_445_500
|
||||
178_337_445_38_116
|
||||
178_337_445_96_116
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,632 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7fa3d250",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6b55c6e8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Local application/library specific imports\n",
|
||||
"from pygrex.config import cfg\n",
|
||||
"from pygrex.data_reader import DataReader, GroupInteractionHandler\n",
|
||||
"# from pygrex.evaluator import SlidingWindowEvaluator\n",
|
||||
"from pygrex.explain import RuleBasedGroupRecExplainer\n",
|
||||
"from pygrex.models import ALS\n",
|
||||
"from pygrex.recommender import GroupRecommender\n",
|
||||
"from pygrex.utils import AggregationStrategy\n",
|
||||
"from pygrex.evaluator import ExplanationEvaluator\n",
|
||||
"\n",
|
||||
"import time\n",
|
||||
"import pandas as pd\n",
|
||||
"import pickle\n",
|
||||
"import os\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "adbf9967",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Data preparation complete.\n",
|
||||
"\n",
|
||||
"--- Data Summary ---\n",
|
||||
"👥 Unique Users: 610\n",
|
||||
"📦 Unique Items: 9,724\n",
|
||||
"⭐ Total Ratings: 100,836\n",
|
||||
"👨👩👧👦 Number of Groups: 17\n",
|
||||
"\n",
|
||||
"Processed Ratings DataFrame Head:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>userId</th>\n",
|
||||
" <th>itemId</th>\n",
|
||||
" <th>rating</th>\n",
|
||||
" <th>timestamp</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982703</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964981247</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982224</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964983815</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>46</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982931</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" userId itemId rating timestamp\n",
|
||||
"0 0 0 1 964982703\n",
|
||||
"1 0 2 1 964981247\n",
|
||||
"2 0 5 1 964982224\n",
|
||||
"3 0 43 1 964983815\n",
|
||||
"4 0 46 1 964982931"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Read the ratings file.\n",
|
||||
"data = DataReader(**cfg.data.test)\n",
|
||||
"data.make_consecutive_ids_in_dataset()\n",
|
||||
"data.binarize(binary_threshold=1)\n",
|
||||
"\n",
|
||||
"# Read the file with the group ids\n",
|
||||
"group_handler = GroupInteractionHandler(**cfg.data.groups)\n",
|
||||
"available_groups = group_handler.read_groups(\"groupsWithHighRatings5.txt\")\n",
|
||||
"print(\"✅ Data preparation complete.\\n\")\n",
|
||||
"\n",
|
||||
"# --- Display Data Summary ---\n",
|
||||
"print(\"--- Data Summary ---\")\n",
|
||||
"print(f\"👥 Unique Users: {data.num_user:,}\")\n",
|
||||
"print(f\"📦 Unique Items: {data.num_item:,}\")\n",
|
||||
"print(f\"⭐ Total Ratings: {len(data.get_raw_dataset()):,}\")\n",
|
||||
"print(f\"👨👩👧👦 Number of Groups: {len(available_groups):,}\")\n",
|
||||
"print(\"\\nProcessed Ratings DataFrame Head:\")\n",
|
||||
"display(data.dataset.head())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fc94aef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 2: Model Training & Evaluation\n",
|
||||
"\n",
|
||||
"With the data prepared, we now select and train a recommendation model. We will use **Alternating Least Squares (ALS)**, a matrix factorization technique for implicit feedback. After training, we will evaluate its performance using a train/test split to measure its Hit Ratio and NDCG."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8c13c283",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 2.1 Model Training ---\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"c:\\Users\\usuar\\miniconda3\\envs\\pygrex-exp-grs\\Lib\\site-packages\\implicit\\cpu\\als.py:95: RuntimeWarning: OpenBLAS is configured to use 8 threads. It is highly recommended to disable its internal threadpool by setting the environment variable 'OPENBLAS_NUM_THREADS=1' or by calling 'threadpoolctl.threadpool_limits(1, \"blas\")'. Having OpenBLAS use a threadpool can lead to severe performance issues here.\n",
|
||||
" check_blas_config()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4a7883a7b94a4a13952cb1d9cf9a33a4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/10 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Model trained successfully in 1.00 seconds!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 2.1 Model Training ---\")\n",
|
||||
"\n",
|
||||
"# Train the recommendation model\n",
|
||||
"model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Train the model\n",
|
||||
"start_time = time.time()\n",
|
||||
"model.fit(data)\n",
|
||||
"end_time = time.time()\n",
|
||||
"training_time = end_time - start_time\n",
|
||||
"\n",
|
||||
"print(f\"✅ Model trained successfully in {training_time:.2f} seconds!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "047fe521",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"\\n--- 2.2 Offline Model Evaluation ---\")\n",
|
||||
"# For evaluation, a new model instance must be created.\n",
|
||||
"# The evaluation function handles its own internal data splitting and training.\n",
|
||||
"eval_model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Define evaluation parameters\n",
|
||||
"test_size = 0.2\n",
|
||||
"top_n = 10\n",
|
||||
"\n",
|
||||
"print(f\"Running evaluation with a {test_size*100:.0f}% test split (Top-{top_n})...\")\n",
|
||||
"\n",
|
||||
"# Run the evaluation\n",
|
||||
"evaluation_scores = run_evaluation_with_proper_split(\n",
|
||||
" data_reader=data,\n",
|
||||
" model=eval_model,\n",
|
||||
" test_size=test_size,\n",
|
||||
" top_n=top_n,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Display evaluation results\n",
|
||||
"print(\"\\n--- Evaluation Results ---\")\n",
|
||||
"print(f\"Hit Ratio @{top_n}: {evaluation_scores.get('Hit Ratio', 0.0):.2%}\")\n",
|
||||
"print(f\"NDCG @{top_n}: {evaluation_scores.get('NDCG', 0.0):.4f}\")\n",
|
||||
"print(f\"Evaluation Time: {evaluation_scores.get('evaluation_time', 0):.1f}s\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "49cb2659",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Group Recommendation\n",
|
||||
"\n",
|
||||
"Now that we have a trained model, we can generate recommendations for a group. We will select a group, choose an aggregation strategy to combine individual member preferences, and generate a Top-10 list of recommended items."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0a138815",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 3. Group Recommendation ---\n",
|
||||
"Generating Top-10 recommendations for group: 522_385_234_452_594\n",
|
||||
"👥 Group Members: [522, 385, 234, 452, 594]\n",
|
||||
"📊 Aggregation Strategy: AVG_PREDICTIONS\n",
|
||||
"\n",
|
||||
"✅ Recommendations generated successfully!\n",
|
||||
"\n",
|
||||
"Top 10 Recommended Items:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Rank</th>\n",
|
||||
" <th>Item ID</th>\n",
|
||||
" <th>Aggregated Score</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>543</td>\n",
|
||||
" <td>4.636274</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>757</td>\n",
|
||||
" <td>4.582981</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>564</td>\n",
|
||||
" <td>4.504107</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>441</td>\n",
|
||||
" <td>4.488708</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>379</td>\n",
|
||||
" <td>4.341830</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>5</th>\n",
|
||||
" <td>6</td>\n",
|
||||
" <td>475</td>\n",
|
||||
" <td>4.279482</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>6</th>\n",
|
||||
" <td>7</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>4.268454</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>7</th>\n",
|
||||
" <td>8</td>\n",
|
||||
" <td>19</td>\n",
|
||||
" <td>4.225248</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>8</th>\n",
|
||||
" <td>9</td>\n",
|
||||
" <td>748</td>\n",
|
||||
" <td>4.178329</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>9</th>\n",
|
||||
" <td>10</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" <td>4.147735</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Rank Item ID Aggregated Score\n",
|
||||
"0 1 543 4.636274\n",
|
||||
"1 2 757 4.582981\n",
|
||||
"2 3 564 4.504107\n",
|
||||
"3 4 441 4.488708\n",
|
||||
"4 5 379 4.341830\n",
|
||||
"5 6 475 4.279482\n",
|
||||
"6 7 43 4.268454\n",
|
||||
"7 8 19 4.225248\n",
|
||||
"8 9 748 4.178329\n",
|
||||
"9 10 64 4.147735"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 3. Group Recommendation ---\")\n",
|
||||
"\n",
|
||||
"# Select a group and strategy\n",
|
||||
"selected_group_id = available_groups[0] # Let's use the first group as an example\n",
|
||||
"group_members = group_handler.parse_group_members(selected_group_id)\n",
|
||||
"aggregation_strategy = AggregationStrategy.AVG_PREDICTIONS # Use the simple average strategy\n",
|
||||
"top_k = 10\n",
|
||||
"\n",
|
||||
"print(f\"Generating Top-{top_k} recommendations for group: {selected_group_id}\")\n",
|
||||
"print(f\"👥 Group Members: {group_members}\")\n",
|
||||
"print(f\"📊 Aggregation Strategy: {aggregation_strategy.name}\")\n",
|
||||
"\n",
|
||||
"# --- Generate Recommendations ---\n",
|
||||
"# 1. Instantiate the GroupRecommender\n",
|
||||
"group_recommender = GroupRecommender(data=data)\n",
|
||||
"\n",
|
||||
"# 2. Setup the recommendation process\n",
|
||||
"group_recommender.setup_recommendation(\n",
|
||||
" model=model,\n",
|
||||
" members=group_members, # type: ignore\n",
|
||||
" data=data,\n",
|
||||
" aggregation_strategy=aggregation_strategy,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Get the final recommendation list\n",
|
||||
"recommended_items = group_recommender.get_group_recommendations(top_k=top_k)\n",
|
||||
"recommendation_scores = group_recommender.get_recommendation_scores()\n",
|
||||
"\n",
|
||||
"print(\"\\n✅ Recommendations generated successfully!\")\n",
|
||||
"\n",
|
||||
"# --- Display Results ---\n",
|
||||
"rec_data = [\n",
|
||||
" {\n",
|
||||
" \"Rank\": i + 1,\n",
|
||||
" \"Item ID\": item_id,\n",
|
||||
" \"Aggregated Score\": recommendation_scores.get(item_id, 0.0),\n",
|
||||
" }\n",
|
||||
" for i, item_id in enumerate(recommended_items) # type: ignore\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"rec_df = pd.DataFrame(rec_data)\n",
|
||||
"print(f\"\\nTop {top_k} Recommended Items:\")\n",
|
||||
"display(rec_df)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6268a2ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 4: Explanation (EXPGRS)\n",
|
||||
"\n",
|
||||
"Finally, we generate an explanation for one of the recommendations. We will use the **EXPGRS** method to find a ruled based explanation. This method calculates the Model Fidelity: the percentage of the Top-N list that can be explained by pre-computed association rules from cached files.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "367063db",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 4. Rule based Explanation (EXPGRS) ---\n",
|
||||
"Explanation Fidelity:\n",
|
||||
"10.00%\n",
|
||||
"--------------------\n",
|
||||
"Advanced Explanation Fidelity:\n",
|
||||
"0.00%\n",
|
||||
"--------------------\n",
|
||||
"Explanation Diversity (GILD):\n",
|
||||
"0.0000\n",
|
||||
"--------------------\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 4. Rule based Explanation (EXPGRS) ---\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def load_cached_data_rules(min_support, min_confidence, rating_threshold):\n",
|
||||
" \"\"\"\n",
|
||||
" Loads pre-computed association rules from the cached_rules folder.\n",
|
||||
" Returns the loaded object (typically a dict with key \"rules\") if found, None otherwise.\n",
|
||||
" Searches several common locations to be robust in notebooks.\n",
|
||||
" \"\"\"\n",
|
||||
" from pathlib import Path\n",
|
||||
"\n",
|
||||
" filename = f\"rules_sup{min_support:.2f}_conf{min_confidence:.1f}_rating{rating_threshold:.0f}\"\n",
|
||||
" possible_extensions = [\".pkl\", \".pickle\", \".json\"]\n",
|
||||
"\n",
|
||||
" cwd = Path.cwd()\n",
|
||||
" search_dirs = [\n",
|
||||
" cwd / \"cached_rules\", # current working directory\n",
|
||||
" cwd.parent / \"cached_rules\", # parent (useful when running from notebooks/)\n",
|
||||
" Path(__file__).resolve().parent / \"cached_rules\" if '__file__' in globals() else None, # script dir if available\n",
|
||||
" ]\n",
|
||||
" search_dirs = [p for p in search_dirs if p is not None]\n",
|
||||
"\n",
|
||||
" tried_paths = []\n",
|
||||
" for base in search_dirs:\n",
|
||||
" for ext in possible_extensions:\n",
|
||||
" filepath = base / f\"{filename}{ext}\"\n",
|
||||
" tried_paths.append(str(filepath))\n",
|
||||
" if filepath.exists():\n",
|
||||
" try:\n",
|
||||
" if ext in [\".pkl\", \".pickle\"]:\n",
|
||||
" with open(filepath, \"rb\") as f:\n",
|
||||
" return pickle.load(f)\n",
|
||||
" elif ext == \".json\":\n",
|
||||
" import json\n",
|
||||
" with open(filepath, \"r\") as f:\n",
|
||||
" return json.load(f)\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"Error loading cached rules from {filepath}: {e}\")\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
" print(\"Cached rules not found. Tried paths:\")\n",
|
||||
" for p in tried_paths:\n",
|
||||
" print(\" -\", p)\n",
|
||||
" return None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_user_history(rating_threshold):\n",
|
||||
" \"\"\"\n",
|
||||
" Generates the user interaction history based only on the rating threshold.\n",
|
||||
" The keys of the returned dictionary are the ORIGINAL user IDs.\n",
|
||||
" \"\"\"\n",
|
||||
" df_filtered = data.dataset[data.dataset[\"rating\"] >= rating_threshold]\n",
|
||||
"\n",
|
||||
" # Group by the 'userId' column (which contains the new, consecutive IDs)\n",
|
||||
" history_by_new_id = df_filtered.groupby(\"userId\")[\"itemId\"].apply(set).to_dict()\n",
|
||||
"\n",
|
||||
" # Create the final dictionary mapping original user IDs to sets of new item IDs\n",
|
||||
" history_by_original_id = {}\n",
|
||||
" for new_id, item_set in history_by_new_id.items():\n",
|
||||
" try:\n",
|
||||
" original_id = data.get_original_user_id(int(new_id))\n",
|
||||
" # The explainer needs the item IDs to be strings to match the rules\n",
|
||||
" history_by_original_id[original_id] = {str(item) for item in item_set}\n",
|
||||
" except (ValueError, KeyError):\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
" return history_by_original_id\n",
|
||||
"\n",
|
||||
"# ----------------------------------------------------------------------- #\n",
|
||||
"\n",
|
||||
"min_support = 0.1\n",
|
||||
"min_confidence = 0.1\n",
|
||||
"rating_threshold = 1\n",
|
||||
"minimum_members = 1\n",
|
||||
"\n",
|
||||
"# Load cached rules (no Streamlit dependencies)\n",
|
||||
"expected_filename = f\"rules_sup{min_support:.2f}_conf{min_confidence:.1f}_rating{rating_threshold:.0f}\"\n",
|
||||
"cached_data_rules = load_cached_data_rules(min_support, min_confidence, rating_threshold)\n",
|
||||
"if cached_data_rules is None:\n",
|
||||
" print(\"⚠️ Cached rules not found.\")\n",
|
||||
" print(\"Looked for:\", \", \".join(\n",
|
||||
" [os.path.join(\"cached_rules\", expected_filename + ext) for ext in [\".pkl\", \".pickle\", \".json\"]]\n",
|
||||
" ))\n",
|
||||
" raise SystemExit(\"Please place the cached rules file in the 'cached_rules/' folder.\")\n",
|
||||
"\n",
|
||||
"# Extract rules from loaded structure\n",
|
||||
"cached_rules = cached_data_rules.get(\"rules\") if isinstance(cached_data_rules, dict) else None\n",
|
||||
"if cached_rules is None:\n",
|
||||
" raise ValueError(\n",
|
||||
" \"Loaded cached rules file does not contain a 'rules' key. Check the file format.\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"# Get user history\n",
|
||||
"user_history = get_user_history(rating_threshold)\n",
|
||||
"\n",
|
||||
"# Create explainer with cached rules\n",
|
||||
"explainer = RuleBasedGroupRecExplainer(\n",
|
||||
" rules=cached_rules,\n",
|
||||
" data=data,\n",
|
||||
" pool_recommendations=recommended_items,\n",
|
||||
" members=group_members,\n",
|
||||
" user_history=user_history,\n",
|
||||
" min_members_threshold=minimum_members,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Compute explanations and metrics\n",
|
||||
"fidelity_score = explainer.find_explanation()\n",
|
||||
"advanced_fidelity_score = explainer.compute_group_fidelity_advanced()\n",
|
||||
"explanation_details = explainer.get_explanation_details()\n",
|
||||
"\n",
|
||||
"explanation_results = {\n",
|
||||
" \"fidelity\": fidelity_score,\n",
|
||||
" \"advanced_fidelity\": advanced_fidelity_score,\n",
|
||||
" \"details\": explanation_details,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Evaluate results\n",
|
||||
"_evaluator = ExplanationEvaluator()\n",
|
||||
"metrics = _evaluator.evaluate(explanation_results, explainer_type=\"EXPGRS\")\n",
|
||||
"\n",
|
||||
"print(\"Explanation Fidelity:\")\n",
|
||||
"print(f\"{metrics.get('fidelity', 0.0):.2%}\")\n",
|
||||
"print(\"-\" * 20)\n",
|
||||
"\n",
|
||||
"print(\"Advanced Explanation Fidelity:\")\n",
|
||||
"print(f\"{explanation_results.get('advanced_fidelity', 0.0):.2%}\")\n",
|
||||
"print(\"-\" * 20)\n",
|
||||
"\n",
|
||||
"print(\"Explanation Diversity (GILD):\")\n",
|
||||
"print(f\"{metrics.get('gild', 0.0):.4f}\")\n",
|
||||
"print(\"-\" * 20)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "pygrex-exp-grs",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,616 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7fa3d250",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6b55c6e8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Local application/library specific imports\n",
|
||||
"from pygrex.config import cfg\n",
|
||||
"from pygrex.data_reader import DataReader, GroupInteractionHandler\n",
|
||||
"from pygrex.evaluator import run_evaluation_with_proper_split\n",
|
||||
"from pygrex.explain.groups.lore4groups_explainer import LORE4GroupsExplainer\n",
|
||||
"from pygrex.models import ALS\n",
|
||||
"from pygrex.recommender import GroupRecommender\n",
|
||||
"from pygrex.utils import AggregationStrategy\n",
|
||||
"from pygrex.evaluator import ExplanationEvaluator\n",
|
||||
"\n",
|
||||
"import time\n",
|
||||
"import pandas as pd\n",
|
||||
"import os\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "adbf9967",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Data preparation complete.\n",
|
||||
"\n",
|
||||
"--- Data Summary ---\n",
|
||||
"👥 Unique Users: 610\n",
|
||||
"📦 Unique Items: 9,724\n",
|
||||
"⭐ Total Ratings: 100,836\n",
|
||||
"👨👩👧👦 Number of Groups: 17\n",
|
||||
"\n",
|
||||
"Processed Ratings DataFrame Head:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>userId</th>\n",
|
||||
" <th>itemId</th>\n",
|
||||
" <th>rating</th>\n",
|
||||
" <th>timestamp</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>4.0</td>\n",
|
||||
" <td>964982703</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>4.0</td>\n",
|
||||
" <td>964981247</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>4.0</td>\n",
|
||||
" <td>964982224</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>5.0</td>\n",
|
||||
" <td>964983815</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>46</td>\n",
|
||||
" <td>5.0</td>\n",
|
||||
" <td>964982931</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" userId itemId rating timestamp\n",
|
||||
"0 0 0 4.0 964982703\n",
|
||||
"1 0 2 4.0 964981247\n",
|
||||
"2 0 5 4.0 964982224\n",
|
||||
"3 0 43 5.0 964983815\n",
|
||||
"4 0 46 5.0 964982931"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Read the ratings file.\n",
|
||||
"data = DataReader(**cfg.data.test)\n",
|
||||
"data.make_consecutive_ids_in_dataset()\n",
|
||||
"# data.binarize(binary_threshold=1)\n",
|
||||
"\n",
|
||||
"# Read the file with the group ids\n",
|
||||
"group_handler = GroupInteractionHandler(**cfg.data.groups)\n",
|
||||
"available_groups = group_handler.read_groups(\"groupsWithHighRatings5.txt\")\n",
|
||||
"print(\"✅ Data preparation complete.\\n\")\n",
|
||||
"\n",
|
||||
"# --- Display Data Summary ---\n",
|
||||
"print(\"--- Data Summary ---\")\n",
|
||||
"print(f\"👥 Unique Users: {data.num_user:,}\")\n",
|
||||
"print(f\"📦 Unique Items: {data.num_item:,}\")\n",
|
||||
"print(f\"⭐ Total Ratings: {len(data.get_raw_dataset()):,}\")\n",
|
||||
"print(f\"👨👩👧👦 Number of Groups: {len(available_groups):,}\")\n",
|
||||
"print(\"\\nProcessed Ratings DataFrame Head:\")\n",
|
||||
"display(data.dataset.head())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fc94aef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 2: Model Training & Evaluation\n",
|
||||
"\n",
|
||||
"With the data prepared, we now select and train a recommendation model. We will use **Alternating Least Squares (ALS)**, a matrix factorization technique for implicit feedback. After training, we will evaluate its performance using a train/test split to measure its Hit Ratio and NDCG."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8c13c283",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 2.1 Model Training ---\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"c:\\Users\\usuar\\miniconda3\\envs\\pygrex-exp-grs\\Lib\\site-packages\\implicit\\cpu\\als.py:95: RuntimeWarning: OpenBLAS is configured to use 8 threads. It is highly recommended to disable its internal threadpool by setting the environment variable 'OPENBLAS_NUM_THREADS=1' or by calling 'threadpoolctl.threadpool_limits(1, \"blas\")'. Having OpenBLAS use a threadpool can lead to severe performance issues here.\n",
|
||||
" check_blas_config()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "40a7ec46978a413e80b045c0f60fbce6",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/10 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Model trained successfully in 0.95 seconds!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 2.1 Model Training ---\")\n",
|
||||
"\n",
|
||||
"# Train the recommendation model\n",
|
||||
"model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Train the model\n",
|
||||
"start_time = time.time()\n",
|
||||
"model.fit(data)\n",
|
||||
"end_time = time.time()\n",
|
||||
"training_time = end_time - start_time\n",
|
||||
"\n",
|
||||
"print(f\"✅ Model trained successfully in {training_time:.2f} seconds!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "047fe521",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"\\n--- 2.2 Offline Model Evaluation ---\")\n",
|
||||
"# For evaluation, a new model instance must be created.\n",
|
||||
"# The evaluation function handles its own internal data splitting and training.\n",
|
||||
"eval_model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Define evaluation parameters\n",
|
||||
"test_size = 0.2\n",
|
||||
"top_n = 10\n",
|
||||
"\n",
|
||||
"print(f\"Running evaluation with a {test_size*100:.0f}% test split (Top-{top_n})...\")\n",
|
||||
"\n",
|
||||
"# Run the evaluation\n",
|
||||
"evaluation_scores = run_evaluation_with_proper_split(\n",
|
||||
" data_reader=data,\n",
|
||||
" model=eval_model,\n",
|
||||
" test_size=test_size,\n",
|
||||
" top_n=top_n,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Display evaluation results\n",
|
||||
"print(\"\\n--- Evaluation Results ---\")\n",
|
||||
"print(f\"Hit Ratio @{top_n}: {evaluation_scores.get('Hit Ratio', 0.0):.2%}\")\n",
|
||||
"print(f\"NDCG @{top_n}: {evaluation_scores.get('NDCG', 0.0):.4f}\")\n",
|
||||
"print(f\"Evaluation Time: {evaluation_scores.get('evaluation_time', 0):.1f}s\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "49cb2659",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Group Recommendation\n",
|
||||
"\n",
|
||||
"Now that we have a trained model, we can generate recommendations for a group. We will select a group, choose an aggregation strategy to combine individual member preferences, and generate a Top-10 list of recommended items."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0a138815",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 3. Group Recommendation ---\n",
|
||||
"Generating Top-10 recommendations for group: 522_385_234_452_594\n",
|
||||
"👥 Group Members: [522, 385, 234, 452, 594]\n",
|
||||
"📊 Aggregation Strategy: AVG_PREDICTIONS\n",
|
||||
"\n",
|
||||
"✅ Recommendations generated successfully!\n",
|
||||
"\n",
|
||||
"Top 10 Recommended Items:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Rank</th>\n",
|
||||
" <th>Item ID</th>\n",
|
||||
" <th>Aggregated Score</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>543</td>\n",
|
||||
" <td>4.636274</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>757</td>\n",
|
||||
" <td>4.582981</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>564</td>\n",
|
||||
" <td>4.504107</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>441</td>\n",
|
||||
" <td>4.488708</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>379</td>\n",
|
||||
" <td>4.341830</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>5</th>\n",
|
||||
" <td>6</td>\n",
|
||||
" <td>475</td>\n",
|
||||
" <td>4.279482</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>6</th>\n",
|
||||
" <td>7</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>4.268454</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>7</th>\n",
|
||||
" <td>8</td>\n",
|
||||
" <td>19</td>\n",
|
||||
" <td>4.225248</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>8</th>\n",
|
||||
" <td>9</td>\n",
|
||||
" <td>748</td>\n",
|
||||
" <td>4.178329</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>9</th>\n",
|
||||
" <td>10</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" <td>4.147735</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Rank Item ID Aggregated Score\n",
|
||||
"0 1 543 4.636274\n",
|
||||
"1 2 757 4.582981\n",
|
||||
"2 3 564 4.504107\n",
|
||||
"3 4 441 4.488708\n",
|
||||
"4 5 379 4.341830\n",
|
||||
"5 6 475 4.279482\n",
|
||||
"6 7 43 4.268454\n",
|
||||
"7 8 19 4.225248\n",
|
||||
"8 9 748 4.178329\n",
|
||||
"9 10 64 4.147735"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 3. Group Recommendation ---\")\n",
|
||||
"\n",
|
||||
"# Select a group and strategy\n",
|
||||
"selected_group_id = available_groups[0] # Let's use the first group as an example\n",
|
||||
"group_members = group_handler.parse_group_members(selected_group_id)\n",
|
||||
"aggregation_strategy = AggregationStrategy.AVG_PREDICTIONS # Use the simple average strategy\n",
|
||||
"top_k = 10\n",
|
||||
"\n",
|
||||
"print(f\"Generating Top-{top_k} recommendations for group: {selected_group_id}\")\n",
|
||||
"print(f\"👥 Group Members: {group_members}\")\n",
|
||||
"print(f\"📊 Aggregation Strategy: {aggregation_strategy.name}\")\n",
|
||||
"\n",
|
||||
"# --- Generate Recommendations ---\n",
|
||||
"# 1. Instantiate the GroupRecommender\n",
|
||||
"group_recommender = GroupRecommender(data=data)\n",
|
||||
"\n",
|
||||
"# 2. Setup the recommendation process\n",
|
||||
"group_recommender.setup_recommendation(\n",
|
||||
" model=model,\n",
|
||||
" members=group_members, # type: ignore\n",
|
||||
" data=data,\n",
|
||||
" aggregation_strategy=aggregation_strategy,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Get the final recommendation list\n",
|
||||
"recommended_items = group_recommender.get_group_recommendations(top_k=top_k)\n",
|
||||
"recommendation_scores = group_recommender.get_recommendation_scores()\n",
|
||||
"\n",
|
||||
"print(\"\\n✅ Recommendations generated successfully!\")\n",
|
||||
"\n",
|
||||
"# --- Display Results ---\n",
|
||||
"rec_data = [\n",
|
||||
" {\n",
|
||||
" \"Rank\": i + 1,\n",
|
||||
" \"Item ID\": item_id,\n",
|
||||
" \"Aggregated Score\": recommendation_scores.get(item_id, 0.0),\n",
|
||||
" }\n",
|
||||
" for i, item_id in enumerate(recommended_items) # type: ignore\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"rec_df = pd.DataFrame(rec_data)\n",
|
||||
"print(f\"\\nTop {top_k} Recommended Items:\")\n",
|
||||
"display(rec_df)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6268a2ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 4: Explanation (LORE4Groups)\n",
|
||||
"\n",
|
||||
"Finally, we generate an explanation for the recommended items using **LORE4Groups**, a local rule-based method. It:\n",
|
||||
"- builds a local neighborhood of similar items using tag profiles\n",
|
||||
"- trains a simple decision tree per item to predict 'like' vs 'not like'\n",
|
||||
"- extracts interpretable rules explaining why items were recommended.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "367063db",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 4. Local Rule-Based Explanation (LORE4Groups) ---\n",
|
||||
"Explanation Fidelity:\n",
|
||||
"25.00%\n",
|
||||
"--------------------\n",
|
||||
"Explanation Diversity (GILD):\n",
|
||||
"0.8871\n",
|
||||
"--------------------\n",
|
||||
"Items with explanations:\n",
|
||||
"['475', '43']\n",
|
||||
"--------------------\n",
|
||||
"Sample item: 475\n",
|
||||
"Decision Path (rules): ['nudity (rear) <= 0.50', 'twins <= 0.50']\n",
|
||||
"Group Factual Rules: {'unanimous': [], 'majority': [], 'minority': ['70mm <= 0.50 (1/5 members)', 'franchise <= 0.50 (1/5 members)', 'futuristmoviescom <= 0.50 (1/5 members)', 'nudity (rear) <= 0.50 (1/5 members)', 'owned <= 0.50 (1/5 members)', 'seen at the cinema <= 0.50 (1/5 members)', 'sequel <= 0.50 (1/5 members)', 'twins <= 0.50 (1/5 members)']}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 4. Local Rule-Based Explanation (LORE4Groups) ---\")\n",
|
||||
"\n",
|
||||
"# 1) Build tag-based item profiles aligned with the ratings dataset\n",
|
||||
"# ---------------------------------------------------------------\n",
|
||||
"# Read tags file from config\n",
|
||||
"_tags_path = cfg.data.tags.tags_file\n",
|
||||
"if not os.path.exists(_tags_path):\n",
|
||||
" raise SystemExit(f\"Tags file not found at: {_tags_path}\")\n",
|
||||
"\n",
|
||||
"# Read tags and align original -> consecutive item ids\n",
|
||||
"tags_df = pd.read_csv(_tags_path)\n",
|
||||
"consecutive_items = set(data.dataset[\"itemId\"].unique())\n",
|
||||
"original_to_consecutive = {}\n",
|
||||
"for item_consec in consecutive_items:\n",
|
||||
" try:\n",
|
||||
" item_orig = data.get_original_item_id(int(item_consec))\n",
|
||||
" original_to_consecutive[item_orig] = int(item_consec)\n",
|
||||
" except (ValueError, KeyError):\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
"# Keep only tags for items present in ratings\n",
|
||||
"tags_df = tags_df[tags_df[\"movieId\"].isin(original_to_consecutive.keys())].copy()\n",
|
||||
"if len(tags_df) == 0:\n",
|
||||
" raise SystemExit(\"No tag data matches items in ratings dataset.\")\n",
|
||||
"\n",
|
||||
"# Normalize labels (keep full label as tag, lowercase)\n",
|
||||
"tags_df[\"label\"] = tags_df[\"label\"].astype(str).str.lower().str.strip()\n",
|
||||
"# Map to consecutive ids\n",
|
||||
"tags_df[\"movieId\"] = tags_df[\"movieId\"].map(original_to_consecutive).astype(int)\n",
|
||||
"\n",
|
||||
"# Keep top-N most frequent labels to reduce sparsity\n",
|
||||
"_top_n = cfg.explainer.lore4groups.top_n_labels\n",
|
||||
"top_labels = (\n",
|
||||
" tags_df[\"label\"].value_counts().nlargest(_top_n).index.tolist()\n",
|
||||
")\n",
|
||||
"tags_final = tags_df[tags_df[\"label\"].isin(top_labels)].copy()\n",
|
||||
"\n",
|
||||
"# Item profiles: {str(itemId): set(labels)}\n",
|
||||
"item_profiles = (\n",
|
||||
" tags_final.groupby(\"movieId\")[\"label\"].apply(set).to_dict()\n",
|
||||
")\n",
|
||||
"item_profiles = {str(k): v for k, v in item_profiles.items()}\n",
|
||||
"\n",
|
||||
"# Item-label matrix (rows: itemId as str, cols: labels, values: 0/1)\n",
|
||||
"item_label_matrix = tags_final.assign(value=1).pivot_table(\n",
|
||||
" index=\"movieId\", columns=\"label\", values=\"value\", fill_value=0\n",
|
||||
")\n",
|
||||
"item_label_matrix.index = item_label_matrix.index.astype(str)\n",
|
||||
"\n",
|
||||
"# 2) Prepare user history in required format\n",
|
||||
"# ------------------------------------------\n",
|
||||
"user_hist = {}\n",
|
||||
"for user_id_orig in group_members:\n",
|
||||
" try:\n",
|
||||
" user_id_consec = data.get_new_user_id(user_id_orig)\n",
|
||||
" hist_items = set(\n",
|
||||
" data.dataset[data.dataset[\"userId\"] == user_id_consec][\"itemId\"].astype(str)\n",
|
||||
" )\n",
|
||||
" user_hist[user_id_orig] = hist_items\n",
|
||||
" except Exception:\n",
|
||||
" user_hist[user_id_orig] = set()\n",
|
||||
"\n",
|
||||
"# Filter recommendations to those we can explain (must exist in profiles)\n",
|
||||
"explainable_recs = [str(i) for i in recommended_items if str(i) in item_profiles]\n",
|
||||
"if not explainable_recs:\n",
|
||||
" print(\"⚠️ No recommended items have sufficient tag data for explanation.\")\n",
|
||||
"else:\n",
|
||||
" # 3) Run LORE4Groups explainer\n",
|
||||
" explainer = LORE4GroupsExplainer(\n",
|
||||
" item_profiles=item_profiles,\n",
|
||||
" item_label_matrix=item_label_matrix,\n",
|
||||
" config=cfg,\n",
|
||||
" genre_profiles=None, # optional, omitted for toy example\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" results = explainer.find_explanation(\n",
|
||||
" explainable_recs,\n",
|
||||
" group_members,\n",
|
||||
" user_hist,\n",
|
||||
" data.dataset,\n",
|
||||
" model=model,\n",
|
||||
" data_reader=data,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" fidelity = results.get(\"fidelity\", 0.0)\n",
|
||||
" details = results.get(\"details\", {})\n",
|
||||
"\n",
|
||||
" print(\"Explanation Fidelity:\")\n",
|
||||
" print(f\"{fidelity:.2%}\")\n",
|
||||
" print(\"-\" * 20)\n",
|
||||
"\n",
|
||||
" # Compute GILD diversity like in the app\n",
|
||||
" evaluator = ExplanationEvaluator()\n",
|
||||
" metrics = evaluator.evaluate({\"fidelity\": fidelity, \"details\": details}, explainer_type=\"LORE4Groups\")\n",
|
||||
" print(\"Explanation Diversity (GILD):\")\n",
|
||||
" print(f\"{metrics.get('gild', 0.0):.4f}\")\n",
|
||||
" print(\"-\" * 20)\n",
|
||||
"\n",
|
||||
" print(\"Items with explanations:\")\n",
|
||||
" print(list(details.keys()))\n",
|
||||
" print(\"-\" * 20)\n",
|
||||
"\n",
|
||||
" # Optionally preview one item's explanation summary if available\n",
|
||||
" if details:\n",
|
||||
" first_item, exp = next(iter(details.items()))\n",
|
||||
" decision_path = exp.get(\"decision_path\", [])\n",
|
||||
" group_factual = exp.get(\"group_factual_rule\", [])\n",
|
||||
" print(f\"Sample item: {first_item}\")\n",
|
||||
" print(\"Decision Path (rules):\", decision_path)\n",
|
||||
" print(\"Group Factual Rules:\", group_factual)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "pygrex-exp-grs",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,575 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7fa3d250",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6b55c6e8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Local application/library specific imports\n",
|
||||
"from pygrex.config import cfg\n",
|
||||
"from pygrex.data_reader import DataReader, GroupInteractionHandler\n",
|
||||
"# from pygrex.evaluator import SlidingWindowEvaluator\n",
|
||||
"from pygrex.explain import SlidingWindowExplainer\n",
|
||||
"from pygrex.models import ALS\n",
|
||||
"from pygrex.recommender import GroupRecommender\n",
|
||||
"from pygrex.utils import SlidingWindow, AggregationStrategy\n",
|
||||
"from pygrex.evaluator import run_evaluation_with_proper_split\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"import time\n",
|
||||
"import pandas as pd\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "adbf9967",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Data preparation complete.\n",
|
||||
"\n",
|
||||
"--- Data Summary ---\n",
|
||||
"👥 Unique Users: 610\n",
|
||||
"📦 Unique Items: 9,724\n",
|
||||
"⭐ Total Ratings: 100,836\n",
|
||||
"👨👩👧👦 Number of Groups: 17\n",
|
||||
"\n",
|
||||
"Processed Ratings DataFrame Head:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>userId</th>\n",
|
||||
" <th>itemId</th>\n",
|
||||
" <th>rating</th>\n",
|
||||
" <th>timestamp</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982703</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964981247</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982224</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964983815</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>46</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>964982931</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" userId itemId rating timestamp\n",
|
||||
"0 0 0 1 964982703\n",
|
||||
"1 0 2 1 964981247\n",
|
||||
"2 0 5 1 964982224\n",
|
||||
"3 0 43 1 964983815\n",
|
||||
"4 0 46 1 964982931"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Read the ratings file.\n",
|
||||
"data = DataReader(**cfg.data.test)\n",
|
||||
"data.make_consecutive_ids_in_dataset()\n",
|
||||
"data.binarize(binary_threshold=1)\n",
|
||||
"\n",
|
||||
"# Read the file with the group ids\n",
|
||||
"group_handler = GroupInteractionHandler(**cfg.data.groups)\n",
|
||||
"available_groups = group_handler.read_groups(\"groupsWithHighRatings5.txt\")\n",
|
||||
"print(\"✅ Data preparation complete.\\n\")\n",
|
||||
"\n",
|
||||
"# --- Display Data Summary ---\n",
|
||||
"print(\"--- Data Summary ---\")\n",
|
||||
"print(f\"👥 Unique Users: {data.num_user:,}\")\n",
|
||||
"print(f\"📦 Unique Items: {data.num_item:,}\")\n",
|
||||
"print(f\"⭐ Total Ratings: {len(data.get_raw_dataset()):,}\")\n",
|
||||
"print(f\"👨👩👧👦 Number of Groups: {len(available_groups):,}\")\n",
|
||||
"print(\"\\nProcessed Ratings DataFrame Head:\")\n",
|
||||
"display(data.dataset.head())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fc94aef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 2: Model Training & Evaluation\n",
|
||||
"\n",
|
||||
"With the data prepared, we now select and train a recommendation model. We will use **Alternating Least Squares (ALS)**, a matrix factorization technique for implicit feedback. After training, we will evaluate its performance using a train/test split to measure its Hit Ratio and NDCG."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8c13c283",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 2.1 Model Training ---\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"c:\\Users\\usuar\\miniconda3\\envs\\pygrex-exp-grs\\Lib\\site-packages\\implicit\\cpu\\als.py:95: RuntimeWarning: OpenBLAS is configured to use 8 threads. It is highly recommended to disable its internal threadpool by setting the environment variable 'OPENBLAS_NUM_THREADS=1' or by calling 'threadpoolctl.threadpool_limits(1, \"blas\")'. Having OpenBLAS use a threadpool can lead to severe performance issues here.\n",
|
||||
" check_blas_config()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4a2c30b182994b868f98ba8d9d2d7d8f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/10 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"✅ Model trained successfully in 1.08 seconds!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 2.1 Model Training ---\")\n",
|
||||
"\n",
|
||||
"# Train the recommendation model\n",
|
||||
"model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Train the model\n",
|
||||
"start_time = time.time()\n",
|
||||
"model.fit(data)\n",
|
||||
"end_time = time.time()\n",
|
||||
"training_time = end_time - start_time\n",
|
||||
"\n",
|
||||
"print(f\"✅ Model trained successfully in {training_time:.2f} seconds!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "047fe521",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"\\n--- 2.2 Offline Model Evaluation ---\")\n",
|
||||
"# For evaluation, a new model instance must be created.\n",
|
||||
"# The evaluation function handles its own internal data splitting and training.\n",
|
||||
"eval_model = ALS(**cfg.model.als)\n",
|
||||
"\n",
|
||||
"# Define evaluation parameters\n",
|
||||
"test_size = 0.2\n",
|
||||
"top_n = 10\n",
|
||||
"\n",
|
||||
"print(f\"Running evaluation with a {test_size*100:.0f}% test split (Top-{top_n})...\")\n",
|
||||
"\n",
|
||||
"# Run the evaluation\n",
|
||||
"evaluation_scores = run_evaluation_with_proper_split(\n",
|
||||
" data_reader=data,\n",
|
||||
" model=eval_model,\n",
|
||||
" test_size=test_size,\n",
|
||||
" top_n=top_n,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Display evaluation results\n",
|
||||
"print(\"\\n--- Evaluation Results ---\")\n",
|
||||
"print(f\"Hit Ratio @{top_n}: {evaluation_scores.get('Hit Ratio', 0.0):.2%}\")\n",
|
||||
"print(f\"NDCG @{top_n}: {evaluation_scores.get('NDCG', 0.0):.4f}\")\n",
|
||||
"print(f\"Evaluation Time: {evaluation_scores.get('evaluation_time', 0):.1f}s\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "49cb2659",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Group Recommendation\n",
|
||||
"\n",
|
||||
"Now that we have a trained model, we can generate recommendations for a group. We will select a group, choose an aggregation strategy to combine individual member preferences, and generate a Top-10 list of recommended items."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "0a138815",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 3. Group Recommendation ---\n",
|
||||
"Generating Top-10 recommendations for group: 522_385_234_452_594\n",
|
||||
"👥 Group Members: [522, 385, 234, 452, 594]\n",
|
||||
"📊 Aggregation Strategy: AVG_PREDICTIONS\n",
|
||||
"\n",
|
||||
"✅ Recommendations generated successfully!\n",
|
||||
"\n",
|
||||
"Top 10 Recommended Items:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Rank</th>\n",
|
||||
" <th>Item ID</th>\n",
|
||||
" <th>Aggregated Score</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>543</td>\n",
|
||||
" <td>4.636274</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>757</td>\n",
|
||||
" <td>4.582981</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>564</td>\n",
|
||||
" <td>4.504107</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>441</td>\n",
|
||||
" <td>4.488708</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>379</td>\n",
|
||||
" <td>4.341830</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>5</th>\n",
|
||||
" <td>6</td>\n",
|
||||
" <td>475</td>\n",
|
||||
" <td>4.279482</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>6</th>\n",
|
||||
" <td>7</td>\n",
|
||||
" <td>43</td>\n",
|
||||
" <td>4.268454</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>7</th>\n",
|
||||
" <td>8</td>\n",
|
||||
" <td>19</td>\n",
|
||||
" <td>4.225248</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>8</th>\n",
|
||||
" <td>9</td>\n",
|
||||
" <td>748</td>\n",
|
||||
" <td>4.178329</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>9</th>\n",
|
||||
" <td>10</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" <td>4.147735</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Rank Item ID Aggregated Score\n",
|
||||
"0 1 543 4.636274\n",
|
||||
"1 2 757 4.582981\n",
|
||||
"2 3 564 4.504107\n",
|
||||
"3 4 441 4.488708\n",
|
||||
"4 5 379 4.341830\n",
|
||||
"5 6 475 4.279482\n",
|
||||
"6 7 43 4.268454\n",
|
||||
"7 8 19 4.225248\n",
|
||||
"8 9 748 4.178329\n",
|
||||
"9 10 64 4.147735"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 3. Group Recommendation ---\")\n",
|
||||
"\n",
|
||||
"# Select a group and strategy\n",
|
||||
"selected_group_id = available_groups[0] # Let's use the first group as an example\n",
|
||||
"group_members = group_handler.parse_group_members(selected_group_id)\n",
|
||||
"aggregation_strategy = AggregationStrategy.AVG_PREDICTIONS # Use the simple average strategy\n",
|
||||
"top_k = 10\n",
|
||||
"\n",
|
||||
"print(f\"Generating Top-{top_k} recommendations for group: {selected_group_id}\")\n",
|
||||
"print(f\"👥 Group Members: {group_members}\")\n",
|
||||
"print(f\"📊 Aggregation Strategy: {aggregation_strategy.name}\")\n",
|
||||
"\n",
|
||||
"# --- Generate Recommendations ---\n",
|
||||
"# 1. Instantiate the GroupRecommender\n",
|
||||
"group_recommender = GroupRecommender(data=data)\n",
|
||||
"\n",
|
||||
"# 2. Setup the recommendation process\n",
|
||||
"group_recommender.setup_recommendation(\n",
|
||||
" model=model,\n",
|
||||
" members=group_members, # type: ignore\n",
|
||||
" data=data,\n",
|
||||
" aggregation_strategy=aggregation_strategy,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# 3. Get the final recommendation list\n",
|
||||
"recommended_items = group_recommender.get_group_recommendations(top_k=top_k)\n",
|
||||
"recommendation_scores = group_recommender.get_recommendation_scores()\n",
|
||||
"\n",
|
||||
"print(\"\\n✅ Recommendations generated successfully!\")\n",
|
||||
"\n",
|
||||
"# --- Display Results ---\n",
|
||||
"rec_data = [\n",
|
||||
" {\n",
|
||||
" \"Rank\": i + 1,\n",
|
||||
" \"Item ID\": item_id,\n",
|
||||
" \"Aggregated Score\": recommendation_scores.get(item_id, 0.0),\n",
|
||||
" }\n",
|
||||
" for i, item_id in enumerate(recommended_items) # type: ignore\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"rec_df = pd.DataFrame(rec_data)\n",
|
||||
"print(f\"\\nTop {top_k} Recommended Items:\")\n",
|
||||
"display(rec_df)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6268a2ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 4: Explanation (Sliding Window)\n",
|
||||
"\n",
|
||||
"Finally, we generate an explanation for one of the recommendations. We will use the **Sliding Window** method to find a counterfactual explanation. This method answers the question: *\"Which minimal set of items, if removed from the group's history, would cause our target item to disappear from the recommendation list?\"*\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "367063db",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--- 4. Counterfactual Explanation (Sliding Window) ---\n",
|
||||
"Generating explanation for recommended item: 543\n",
|
||||
"Sliding Window Size: 3\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "6c1d8da938db475ab005c2378f99feae",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/10 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "221213362ff442f7b9e736536844c6b9",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/10 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"If the group had not interacted with these items [np.int64(480)],\n",
|
||||
"the item of interest 543 would not have appeared on the recommendation list;\n",
|
||||
"instead, 303 would have been recommended.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"--- 4. Counterfactual Explanation (Sliding Window) ---\")\n",
|
||||
"\n",
|
||||
"# Select a target item from our recommendation list to explain\n",
|
||||
"target_item = recommended_items[0]\n",
|
||||
"# Configure the explainer\n",
|
||||
"window_size = 3\n",
|
||||
"# These weights determine how to rank items from the group's history\n",
|
||||
"# before attempting to remove them to find an explanation.\n",
|
||||
"ranking_weights = {\n",
|
||||
" \"popularity\": 1.0,\n",
|
||||
" \"intensity\": 1.0,\n",
|
||||
" \"rating\": 1.0,\n",
|
||||
" \"relevance\": 1.0,\n",
|
||||
" \"trend\": 1.0,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"print(f\"Generating explanation for recommended item: {target_item}\")\n",
|
||||
"print(f\"Sliding Window Size: {window_size}\\n\")\n",
|
||||
"\n",
|
||||
"# --- Generate Explanation ---\n",
|
||||
"# 1. Get all items previously rated by the group\n",
|
||||
"items_rated_by_group = group_handler.get_rated_items_by_all_group_members(\n",
|
||||
" group=group_members, original_data=data\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# 2. Instantiate the explainer\n",
|
||||
"explainer = SlidingWindowExplainer(\n",
|
||||
" config=cfg, # Not needed for this explainer\n",
|
||||
" data=data,\n",
|
||||
" group_handler=group_handler,\n",
|
||||
" members=group_members,\n",
|
||||
" target_item=target_item,\n",
|
||||
" aggregation_strategy=aggregation_strategy,\n",
|
||||
" model=model,\n",
|
||||
" window_size=window_size,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# 3. Find the explanation\n",
|
||||
"explanations = explainer.find_explanation(\n",
|
||||
" items_rated_by_group=items_rated_by_group,\n",
|
||||
" group_predictions=group_recommender.get_individual_predictions(),\n",
|
||||
" top_recommendation=group_recommender.get_top_recommendation(),\n",
|
||||
" ranking_weights=ranking_weights,\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "pygrex-exp-grs",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Reference in New Issue
Block a user