什么是DeepMatch?

众所周知,推荐系统包括召回(match)->粗排(rank)->精排(rank)->重排(rerank)。阿里巴巴的浅梦大神对于召回和排序分别开发了两套框架deepctr, deepmatch。

如何安装deepctr和deepmatch?

截止到2021-10-02,deepmatch只支持tf到1.x版本,tf-2.0.0及以上版本暂时不支持deepmatch,且deepmatch依赖于deepctr的0.8.2版本。安装deepctr:pip install deepctr[cpu]以及pip install deepctr[gpu],安装tensorflow:pip install tensorflow==1.14.0,安装deepmatch:pip install -U deepmatch

什么是faiss?

faiss是为稠密向量提供高效相似度搜索和聚类的框架。由Facebook AI Research研发。 具有以下特性。

1、提供多种检索方法
2、速度快
3、可存在内存和磁盘中
4、C++实现,提供Python封装调用。
5、大部分算法支持GPU实现

召回实战

接下来我们利用deepmatch以及deepctr进行召回实战。
movielens_sample.txt

user_id,movie_id,rating,timestamp,title,genres,gender,age,occupation,zip
1,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),Drama,F,1,10,48067
1,661,3,978302109,James and the Giant Peach (1996),Animation|Children's|Musical,F,1,10,48067
1,914,3,978301968,My Fair Lady (1964),Musical|Romance,F,1,10,48067
1,3408,4,978300275,Erin Brockovich (2000),Drama,F,1,10,48067
1,2355,5,978824291,"Bug's Life, A (1998)",Animation|Children's|Comedy,F,1,10,48067
1,1197,3,978302268,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,F,1,10,48067
1,1287,5,978302039,Ben-Hur (1959),Action|Adventure|Drama,F,1,10,48067
1,2804,5,978300719,"Christmas Story, A (1983)",Comedy|Drama,F,1,10,48067
1,594,4,978302268,Snow White and the Seven Dwarfs (1937),Animation|Children's|Musical,F,1,10,48067
1,919,4,978301368,"Wizard of Oz, The (1939)",Adventure|Children's|Drama|Musical,F,1,10,48067
1,595,5,978824268,Beauty and the Beast (1991),Animation|Children's|Musical,F,1,10,48067
1,938,4,978301752,Gigi (1958),Musical,F,1,10,48067
1,2398,4,978302281,Miracle on 34th Street (1947),Drama,F,1,10,48067
1,2918,4,978302124,Ferris Bueller's Day Off (1986),Comedy,F,1,10,48067
1,1035,5,978301753,"Sound of Music, The (1965)",Musical,F,1,10,48067
1,2791,4,978302188,Airplane! (1980),Comedy,F,1,10,48067
1,2687,3,978824268,Tarzan (1999),Animation|Children's,F,1,10,48067
1,2018,4,978301777,Bambi (1942),Animation|Children's,F,1,10,48067
1,3105,5,978301713,Awakenings (1990),Drama,F,1,10,48067
1,2797,4,978302039,Big (1988),Comedy|Fantasy,F,1,10,48067
1,2321,3,978302205,Pleasantville (1998),Comedy,F,1,10,48067
1,720,3,978300760,Wallace & Gromit: The Best of Aardman Animation (1996),Animation,F,1,10,48067
1,1270,5,978300055,Back to the Future (1985),Comedy|Sci-Fi,F,1,10,48067
1,527,5,978824195,Schindler's List (1993),Drama|War,F,1,10,48067
1,2340,3,978300103,Meet Joe Black (1998),Romance,F,1,10,48067
1,48,5,978824351,Pocahontas (1995),Animation|Children's|Musical|Romance,F,1,10,48067
1,1097,4,978301953,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi,F,1,10,48067
1,1721,4,978300055,Titanic (1997),Drama|Romance,F,1,10,48067
1,1545,4,978824139,Ponette (1996),Drama,F,1,10,48067
1,745,3,978824268,"Close Shave, A (1995)",Animation|Comedy|Thriller,F,1,10,48067
1,2294,4,978824291,Antz (1998),Animation|Children's,F,1,10,48067
1,3186,4,978300019,"Girl, Interrupted (1999)",Drama,F,1,10,48067
1,1566,4,978824330,Hercules (1997),Adventure|Animation|Children's|Comedy|Musical,F,1,10,48067
1,588,4,978824268,Aladdin (1992),Animation|Children's|Comedy|Musical,F,1,10,48067
1,1907,4,978824330,Mulan (1998),Animation|Children's,F,1,10,48067
1,783,4,978824291,"Hunchback of Notre Dame, The (1996)",Animation|Children's|Musical,F,1,10,48067
1,1836,5,978300172,"Last Days of Disco, The (1998)",Drama,F,1,10,48067
1,1022,5,978300055,Cinderella (1950),Animation|Children's|Musical,F,1,10,48067
1,2762,4,978302091,"Sixth Sense, The (1999)",Thriller,F,1,10,48067
1,150,5,978301777,Apollo 13 (1995),Drama,F,1,10,48067
1,1,5,978824268,Toy Story (1995),Animation|Children's|Comedy,F,1,10,48067
1,1961,5,978301590,Rain Man (1988),Drama,F,1,10,48067
1,1962,4,978301753,Driving Miss Daisy (1989),Drama,F,1,10,48067
1,2692,4,978301570,Run Lola Run (Lola rennt) (1998),Action|Crime|Romance,F,1,10,48067
1,260,4,978300760,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,F,1,10,48067
1,1028,5,978301777,Mary Poppins (1964),Children's|Comedy|Musical,F,1,10,48067
1,1029,5,978302205,Dumbo (1941),Animation|Children's|Musical,F,1,10,48067
1,1207,4,978300719,To Kill a Mockingbird (1962),Drama,F,1,10,48067
1,2028,5,978301619,Saving Private Ryan (1998),Action|Drama|War,F,1,10,48067
1,531,4,978302149,"Secret Garden, The (1993)",Children's|Drama,F,1,10,48067
1,3114,4,978302174,Toy Story 2 (1999),Animation|Children's|Comedy,F,1,10,48067
1,608,4,978301398,Fargo (1996),Crime|Drama|Thriller,F,1,10,48067
1,1246,4,978302091,Dead Poets Society (1989),Drama,F,1,10,48067
2,1193,5,978298413,One Flew Over the Cuckoo's Nest (1975),Drama,M,56,16,70072
2,3105,4,978298673,Awakenings (1990),Drama,M,56,16,70072
2,2321,3,978299666,Pleasantville (1998),Comedy,M,56,16,70072
2,1962,5,978298813,Driving Miss Daisy (1989),Drama,M,56,16,70072
2,1207,4,978298478,To Kill a Mockingbird (1962),Drama,M,56,16,70072
2,2028,4,978299773,Saving Private Ryan (1998),Action|Drama|War,M,56,16,70072
2,1246,5,978299418,Dead Poets Society (1989),Drama,M,56,16,70072
2,1357,5,978298709,Shine (1996),Drama|Romance,M,56,16,70072
2,3068,4,978299000,"Verdict, The (1982)",Drama,M,56,16,70072
2,1537,4,978299620,Shall We Dance? (Shall We Dansu?) (1996),Comedy,M,56,16,70072
2,647,3,978299351,Courage Under Fire (1996),Drama|War,M,56,16,70072
2,2194,4,978299297,"Untouchables, The (1987)",Action|Crime|Drama,M,56,16,70072
2,648,4,978299913,Mission: Impossible (1996),Action|Adventure|Mystery,M,56,16,70072
2,2268,5,978299297,"Few Good Men, A (1992)",Crime|Drama,M,56,16,70072
2,2628,3,978300051,Star Wars: Episode I - The Phantom Menace (1999),Action|Adventure|Fantasy|Sci-Fi,M,56,16,70072
2,1103,3,978298905,Rebel Without a Cause (1955),Drama,M,56,16,70072
2,2916,3,978299809,Total Recall (1990),Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,3468,5,978298542,"Hustler, The (1961)",Drama,M,56,16,70072
2,1210,4,978298151,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,M,56,16,70072
2,1792,3,978299941,U.S. Marshalls (1998),Action|Thriller,M,56,16,70072
2,1687,3,978300174,"Jackal, The (1997)",Action|Thriller,M,56,16,70072
2,1213,2,978298458,GoodFellas (1990),Crime|Drama,M,56,16,70072
2,3578,5,978298958,Gladiator (2000),Action|Drama,M,56,16,70072
2,2881,3,978300002,Double Jeopardy (1999),Action|Thriller,M,56,16,70072
2,3030,4,978298434,Yojimbo (1961),Comedy|Drama|Western,M,56,16,70072
2,1217,3,978298151,Ran (1985),Drama|War,M,56,16,70072
2,434,2,978300174,Cliffhanger (1993),Action|Adventure|Crime,M,56,16,70072
2,2126,3,978300123,Snake Eyes (1998),Action|Crime|Mystery|Thriller,M,56,16,70072
2,3107,2,978300002,Backdraft (1991),Action|Drama,M,56,16,70072
2,3108,3,978299712,"Fisher King, The (1991)",Comedy|Drama|Romance,M,56,16,70072
2,3035,4,978298625,Mister Roberts (1955),Comedy|Drama|War,M,56,16,70072
2,1253,3,978299120,"Day the Earth Stood Still, The (1951)",Drama|Sci-Fi,M,56,16,70072
2,1610,5,978299809,"Hunt for Red October, The (1990)",Action|Thriller,M,56,16,70072
2,292,3,978300123,Outbreak (1995),Action|Drama|Thriller,M,56,16,70072
2,2236,5,978299220,Simon Birch (1998),Drama,M,56,16,70072
2,3071,4,978299120,Stand and Deliver (1987),Drama,M,56,16,70072
2,902,2,978298905,Breakfast at Tiffany's (1961),Drama|Romance,M,56,16,70072
2,368,4,978300002,Maverick (1994),Action|Comedy|Western,M,56,16,70072
2,1259,5,978298841,Stand by Me (1986),Adventure|Comedy|Drama,M,56,16,70072
2,3147,5,978298652,"Green Mile, The (1999)",Drama|Thriller,M,56,16,70072
2,1544,4,978300174,"Lost World: Jurassic Park, The (1997)",Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,1293,5,978298261,Gandhi (1982),Drama,M,56,16,70072
2,1188,4,978299620,Strictly Ballroom (1992),Comedy|Romance,M,56,16,70072
2,3255,4,978299321,"League of Their Own, A (1992)",Comedy|Drama,M,56,16,70072
2,3256,2,978299839,Patriot Games (1992),Action|Thriller,M,56,16,70072
2,3257,3,978300073,"Bodyguard, The (1992)",Action|Drama|Romance|Thriller,M,56,16,70072
2,110,5,978298625,Braveheart (1995),Action|Drama|War,M,56,16,70072
2,2278,3,978299889,Ronin (1998),Action|Crime|Thriller,M,56,16,70072
2,2490,3,978299966,Payback (1999),Action|Thriller,M,56,16,70072
2,1834,4,978298813,"Spanish Prisoner, The (1997)",Drama|Thriller,M,56,16,70072
2,3471,5,978298814,Close Encounters of the Third Kind (1977),Drama|Sci-Fi,M,56,16,70072
2,589,4,978299773,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller,M,56,16,70072
2,1690,3,978300051,Alien: Resurrection (1997),Action|Horror|Sci-Fi,M,56,16,70072
2,3654,3,978298814,"Guns of Navarone, The (1961)",Action|Drama|War,M,56,16,70072
2,2852,3,978298958,"Soldier's Story, A (1984)",Drama,M,56,16,70072
2,1945,5,978298458,On the Waterfront (1954),Crime|Drama,M,56,16,70072
2,982,4,978299269,Picnic (1955),Drama,M,56,16,70072
2,1873,4,978298542,"Mis�rables, Les (1998)",Drama,M,56,16,70072
2,2858,4,978298434,American Beauty (1999),Comedy|Drama,M,56,16,70072
2,1225,5,978298391,Amadeus (1984),Drama,M,56,16,70072
2,515,5,978298542,"Remains of the Day, The (1993)",Drama,M,56,16,70072
2,442,3,978300025,Demolition Man (1993),Action|Sci-Fi,M,56,16,70072
2,2312,3,978299046,Children of a Lesser God (1986),Drama,M,56,16,70072
2,265,4,978299026,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance,M,56,16,70072
2,1408,3,978299839,"Last of the Mohicans, The (1992)",Action|Romance|War,M,56,16,70072
2,1084,3,978298813,Bonnie and Clyde (1967),Crime|Drama,M,56,16,70072
2,3699,2,978299173,Starman (1984),Adventure|Drama|Romance|Sci-Fi,M,56,16,70072
2,480,5,978299809,Jurassic Park (1993),Action|Adventure|Sci-Fi,M,56,16,70072
2,1442,4,978299297,Prefontaine (1997),Drama,M,56,16,70072
2,2067,5,978298625,Doctor Zhivago (1965),Drama|Romance|War,M,56,16,70072
2,1265,3,978299712,Groundhog Day (1993),Comedy|Romance,M,56,16,70072
2,1370,5,978299889,Die Hard 2 (1990),Action|Thriller,M,56,16,70072
2,1801,3,978300002,"Man in the Iron Mask, The (1998)",Action|Drama|Romance,M,56,16,70072
2,1372,3,978299941,Star Trek VI: The Undiscovered Country (1991),Action|Adventure|Sci-Fi,M,56,16,70072
2,2353,4,978299861,Enemy of the State (1998),Action|Thriller,M,56,16,70072
2,3334,4,978298958,Key Largo (1948),Crime|Drama|Film-Noir|Thriller,M,56,16,70072
2,2427,2,978299913,"Thin Red Line, The (1998)",Action|Drama|War,M,56,16,70072
2,590,5,978299083,Dances with Wolves (1990),Adventure|Drama|Western,M,56,16,70072
2,1196,5,978298730,Star Wars: Episode V - The Empire Strikes Back (1980),Action|Adventure|Drama|Sci-Fi|War,M,56,16,70072
2,1552,3,978299941,Con Air (1997),Action|Adventure|Thriller,M,56,16,70072
2,736,4,978300100,Twister (1996),Action|Adventure|Romance|Thriller,M,56,16,70072
2,1198,4,978298124,Raiders of the Lost Ark (1981),Action|Adventure,M,56,16,70072
2,593,5,978298517,"Silence of the Lambs, The (1991)",Drama|Thriller,M,56,16,70072
2,2359,3,978299666,Waking Ned Devine (1998),Comedy,M,56,16,70072
2,95,2,978300143,Broken Arrow (1996),Action|Thriller,M,56,16,70072
2,2717,3,978298196,Ghostbusters II (1989),Comedy|Horror,M,56,16,70072
2,2571,4,978299773,"Matrix, The (1999)",Action|Sci-Fi|Thriller,M,56,16,70072
2,1917,3,978300174,Armageddon (1998),Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,2396,4,978299641,Shakespeare in Love (1998),Comedy|Romance,M,56,16,70072
2,3735,3,978298814,Serpico (1973),Crime|Drama,M,56,16,70072
2,1953,4,978298775,"French Connection, The (1971)",Action|Crime|Drama|Thriller,M,56,16,70072
2,1597,3,978300025,Conspiracy Theory (1997),Action|Mystery|Romance|Thriller,M,56,16,70072
2,3809,3,978299712,What About Bob? (1991),Comedy,M,56,16,70072
2,1954,5,978298841,Rocky (1976),Action|Drama,M,56,16,70072
2,1955,4,978299200,Kramer Vs. Kramer (1979),Drama,M,56,16,70072
2,235,3,978299351,Ed Wood (1994),Comedy|Drama,M,56,16,70072
2,1124,5,978299418,On Golden Pond (1981),Drama,M,56,16,70072
2,1957,5,978298750,Chariots of Fire (1981),Drama,M,56,16,70072
2,163,4,978299809,Desperado (1995),Action|Romance|Thriller,M,56,16,70072
2,21,1,978299839,Get Shorty (1995),Action|Comedy|Drama,M,56,16,70072
2,165,3,978300002,Die Hard: With a Vengeance (1995),Action|Thriller,M,56,16,70072
2,1090,2,978298580,Platoon (1986),Drama|War,M,56,16,70072
2,380,5,978299809,True Lies (1994),Action|Adventure|Comedy|Romance,M,56,16,70072
2,2501,5,978298600,October Sky (1999),Drama,M,56,16,70072
2,349,4,978299839,Clear and Present Danger (1994),Action|Adventure|Thriller,M,56,16,70072
2,457,4,978299773,"Fugitive, The (1993)",Action|Thriller,M,56,16,70072
2,1096,4,978299386,Sophie's Choice (1982),Drama,M,56,16,70072
2,920,5,978298775,Gone with the Wind (1939),Drama|Romance|War,M,56,16,70072
2,459,3,978300002,"Getaway, The (1994)",Action,M,56,16,70072
2,1527,4,978299839,"Fifth Element, The (1997)",Action|Sci-Fi,M,56,16,70072
2,3418,4,978299809,Thelma & Louise (1991),Action|Drama,M,56,16,70072
2,1385,3,978299966,Under Siege (1992),Action,M,56,16,70072
2,3451,4,978298924,Guess Who's Coming to Dinner (1967),Comedy|Drama,M,56,16,70072
2,3095,4,978298517,"Grapes of Wrath, The (1940)",Drama,M,56,16,70072
2,780,3,978299966,Independence Day (ID4) (1996),Action|Sci-Fi|War,M,56,16,70072
2,498,3,978299418,Mr. Jones (1993),Drama|Romance,M,56,16,70072
2,2728,3,978298881,Spartacus (1960),Drama,M,56,16,70072
2,2002,5,978300100,Lethal Weapon 3 (1992),Action|Comedy|Crime|Drama,M,56,16,70072
2,1784,5,978298841,As Good As It Gets (1997),Comedy|Drama,M,56,16,70072
2,2943,4,978298372,Indochine (1992),Drama|Romance,M,56,16,70072
2,2006,3,978299861,"Mask of Zorro, The (1998)",Action|Adventure|Romance,M,56,16,70072
2,318,5,978298413,"Shawshank Redemption, The (1994)",Drama,M,56,16,70072
2,1968,2,978298881,"Breakfast Club, The (1985)",Comedy|Drama,M,56,16,70072
2,3678,3,978299250,"Man with the Golden Arm, The (1955)",Drama,M,56,16,70072
2,1244,3,978299143,Manhattan (1979),Comedy|Drama|Romance,M,56,16,70072
2,356,5,978299686,Forrest Gump (1994),Comedy|Romance|War,M,56,16,70072
2,1245,2,978299200,Miller's Crossing (1990),Drama,M,56,16,70072
2,3893,1,978299535,Nurse Betty (2000),Comedy|Thriller,M,56,16,70072
2,1247,5,978298652,"Graduate, The (1967)",Drama|Romance,M,56,16,70072
3,2355,5,978298430,"Bug's Life, A (1998)",Animation|Children's|Comedy,M,25,15,55117
3,1197,5,978297570,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,M,25,15,55117
3,1270,3,978298231,Back to the Future (1985),Comedy|Sci-Fi,M,25,15,55117
3,1961,4,978297095,Rain Man (1988),Drama,M,25,15,55117
3,260,5,978297512,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,M,25,15,55117
3,3114,3,978298103,Toy Story 2 (1999),Animation|Children's|Comedy,M,25,15,55117
3,648,3,978297867,Mission: Impossible (1996),Action|Adventure|Mystery,M,25,15,55117
3,1210,4,978297600,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,M,25,15,55117
3,1259,5,978298296,Stand by Me (1986),Adventure|Comedy|Drama,M,25,15,55117
3,2858,4,978297039,American Beauty (1999),Comedy|Drama,M,25,15,55117
3,480,4,978297690,Jurassic Park (1993),Action|Adventure|Sci-Fi,M,25,15,55117
3,1265,2,978298316,Groundhog Day (1993),Comedy|Romance,M,25,15,55117
3,590,4,978297439,Dances with Wolves (1990),Adventure|Drama|Western,M,25,15,55117
3,1196,4,978297539,Star Wars: Episode V - The Empire Strikes Back (1980),Action|Adventure|Drama|Sci-Fi|War,M,25,15,55117
3,1198,5,978297570,Raiders of the Lost Ark (1981),Action|Adventure,M,25,15,55117
3,593,3,978297018,"Silence of the Lambs, The (1991)",Drama|Thriller,M,25,15,55117
3,2006,4,978297757,"Mask of Zorro, The (1998)",Action|Adventure|Romance,M,25,15,55117
3,1968,4,978297068,"Breakfast Club, The (1985)",Comedy|Drama,M,25,15,55117
3,3421,4,978298147,Animal House (1978),Comedy,M,25,15,55117
3,1641,2,978298430,"Full Monty, The (1997)",Comedy,M,25,15,55117
3,1394,4,978298147,Raising Arizona (1987),Comedy,M,25,15,55117
3,3534,3,978297068,28 Days (2000),Comedy,M,25,15,55117
3,104,4,978298486,Happy Gilmore (1996),Comedy,M,25,15,55117
3,2735,4,978297867,"Golden Child, The (1986)",Action|Adventure|Comedy,M,25,15,55117
3,1431,3,978297095,Beverly Hills Ninja (1997),Action|Comedy,M,25,15,55117
3,3868,3,978298486,"Naked Gun: From the Files of Police Squad!, The (1988)",Comedy,M,25,15,55117
3,1079,5,978298296,"Fish Called Wanda, A (1988)",Comedy,M,25,15,55117
3,2997,3,978298147,Being John Malkovich (1999),Comedy,M,25,15,55117
3,1615,5,978297710,"Edge, The (1997)",Adventure|Thriller,M,25,15,55117
3,1291,4,978297600,Indiana Jones and the Last Crusade (1989),Action|Adventure,M,25,15,55117
3,653,4,978297757,Dragonheart (1996),Action|Adventure|Fantasy,M,25,15,55117
3,2167,5,978297600,Blade (1998),Action|Adventure|Horror,M,25,15,55117
3,1580,3,978297663,Men in Black (1997),Action|Adventure|Comedy|Sci-Fi,M,25,15,55117
3,3619,2,978298201,"Hollywood Knights, The (1980)",Comedy,M,25,15,55117
3,1049,4,978297805,"Ghost and the Darkness, The (1996)",Action|Adventure,M,25,15,55117
3,1261,1,978297663,Evil Dead II (Dead By Dawn) (1987),Action|Adventure|Comedy|Horror,M,25,15,55117
3,552,4,978297837,"Three Musketeers, The (1993)",Action|Adventure|Comedy,M,25,15,55117
3,1266,5,978297396,Unforgiven (1992),Western,M,25,15,55117
3,733,5,978297757,"Rock, The (1996)",Action|Adventure|Thriller,M,25,15,55117
3,1378,5,978297419,Young Guns (1988),Action|Comedy|Western,M,25,15,55117
3,1379,4,978297419,Young Guns II (1990),Action|Comedy|Western,M,25,15,55117
3,3552,5,978298459,Caddyshack (1980),Comedy,M,25,15,55117
3,1304,5,978298166,Butch Cassidy and the Sundance Kid (1969),Action|Comedy|Western,M,25,15,55117
3,2470,4,978297777,Crocodile Dundee (1986),Adventure|Comedy,M,25,15,55117
3,3168,4,978297570,Easy Rider (1969),Adventure|Drama,M,25,15,55117
3,2617,2,978297837,"Mummy, The (1999)",Action|Adventure|Horror|Thriller,M,25,15,55117
3,3671,5,978297419,Blazing Saddles (1974),Comedy|Western,M,25,15,55117
3,2871,4,978297539,Deliverance (1972),Adventure|Thriller,M,25,15,55117
3,2115,4,978297777,Indiana Jones and the Temple of Doom (1984),Action|Adventure,M,25,15,55117
3,1136,5,978298079,Monty Python and the Holy Grail (1974),Comedy,M,25,15,55117
3,2081,4,978298504,"Little Mermaid, The (1989)",Animation|Children's|Comedy|Musical|Romance,M,25,15,55117

main.py

import pandas as pd
from deepctr.feature_column import SparseFeat, VarLenSparseFeat
from preprocess import gen_data_set, gen_model_input
from sklearn.preprocessing import LabelEncoder
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import Modelfrom deepmatch.models import *
from deepmatch.utils import sampledsoftmaxloss# 以movielens数据为例,取200条样例数据进行流程演示data = pd.read_csvdata = pd.read_csv("./movielens_sample.txt")
sparse_features = ["movie_id", "user_id","gender", "age", "occupation", "zip", ]
SEQ_LEN = 50
negsample = 0# 1. 首先对于数据中的特征进行ID化编码,然后使用 `gen_date_set` and `gen_model_input`来生成带有用户历史行为序列的特征数据features = ['user_id', 'movie_id', 'gender', 'age', 'occupation', 'zip']
feature_max_idx = {}
for feature in features:lbe = LabelEncoder()data[feature] = lbe.fit_transform(data[feature]) + 1feature_max_idx[feature] = data[feature].max() + 1user_profile = data[["user_id", "gender", "age", "occupation", "zip"]].drop_duplicates('user_id')item_profile = data[["movie_id"]].drop_duplicates('movie_id')user_profile.set_index("user_id", inplace=True)user_item_list = data.groupby("user_id")['movie_id'].apply(list)train_set, test_set = gen_data_set(data, negsample)train_model_input, train_label = gen_model_input(train_set, user_profile, SEQ_LEN)
test_model_input, test_label = gen_model_input(test_set, user_profile, SEQ_LEN)# 2. 配置一下模型定义需要的特征列,主要是特征名和embedding词表的大小embedding_dim = 16user_feature_columns = [SparseFeat('user_id', feature_max_idx['user_id'], embedding_dim),SparseFeat("gender", feature_max_idx['gender'], embedding_dim),SparseFeat("age", feature_max_idx['age'], embedding_dim),SparseFeat("occupation", feature_max_idx['occupation'], embedding_dim),SparseFeat("zip", feature_max_idx['zip'], embedding_dim),VarLenSparseFeat(SparseFeat('hist_movie_id', feature_max_idx['movie_id'], embedding_dim,embedding_name="movie_id"), SEQ_LEN, 'mean', 'hist_len'),]item_feature_columns = [SparseFeat('movie_id', feature_max_idx['movie_id'], embedding_dim)]# 3. 定义一个YoutubeDNN模型,分别传入用户侧特征列表`user_feature_columns`和物品侧特征列表`item_feature_columns`。然后配置优化器和损失函数,开始进行训练。K.set_learning_phase(True)model = YoutubeDNN(user_feature_columns, item_feature_columns, num_sampled=5, user_dnn_hidden_units=(64, 16))
# model = MIND(user_feature_columns,item_feature_columns,dynamic_k=True,p=1,k_max=2,num_sampled=5,user_dnn_hidden_units=(64,16),init_std=0.001)model.compile(optimizer="adagrad", loss=sampledsoftmaxloss)  # "binary_crossentropy")history = model.fit(train_model_input, train_label,  # train_label,batch_size=256, epochs=1, verbose=1, validation_split=0.0, )# 4. 训练完整后,由于在实际使用时,我们需要根据当前的用户特征实时产生用户侧向量,并对物品侧向量构建索引进行近似最近邻查找。这里由于是离线模拟,所以我们导出所有待测试用户的表示向量,和所有物品的表示向量。test_user_model_input = test_model_input
all_item_model_input = {"movie_id": item_profile['movie_id'].values, "movie_idx": item_profile['movie_id'].values}# 以下两行是deepmatch中的通用使用方法,分别获得用户向量模型和物品向量模型
user_embedding_model = Model(inputs=model.user_input, outputs=model.user_embedding)
item_embedding_model = Model(inputs=model.item_input, outputs=model.item_embedding)
# 输入对应的数据拿到对应的向量
user_embs = user_embedding_model.predict(test_user_model_input, batch_size=2 ** 12)
# user_embs = user_embs[:, i, :]  i in [0,k_max) if MIND
item_embs = item_embedding_model.predict(all_item_model_input, batch_size=2 ** 12)print(user_embs.shape)
print(item_embs.shape)# 5. [可选的]如果有安装faiss库的同学,可以体验以下将上一步导出的物品向量构建索引,然后用用户向量来进行ANN查找并评估效果test_true_label = {line[0]: [line[2]] for line in test_set}
import numpy as np
import faiss
from tqdm import tqdm
from deepmatch.utils import recall_Nindex = faiss.IndexFlatIP(embedding_dim)
# faiss.normalize_L2(item_embs)
index.add(item_embs)
# faiss.normalize_L2(user_embs)
D, I = index.search(user_embs, 50)
s = []
hit = 0
for i, uid in tqdm(enumerate(test_user_model_input['user_id'])):try:pred = [item_profile['movie_id'].values[x] for x in I[i]]filter_item = Nonerecall_score = recall_N(test_true_label[uid], pred, N=50)s.append(recall_score)if test_true_label[uid] in pred:hit += 1except:print(i)
print("recall", np.mean(s))
print("hr", hit / len(test_user_model_input['user_id']))

preprocess.py

import random
import numpy as np
from tqdm import tqdm
from tensorflow.python.keras.preprocessing.sequence import pad_sequencesdef gen_data_set(data, negsample=0):data.sort_values("timestamp", inplace=True)item_ids = data['movie_id'].unique()train_set = []test_set = []for reviewerID, hist in tqdm(data.groupby('user_id')):pos_list = hist['movie_id'].tolist()rating_list = hist['rating'].tolist()if negsample > 0:candidate_set = list(set(item_ids) - set(pos_list))neg_list = np.random.choice(candidate_set,size=len(pos_list)*negsample,replace=True)for i in range(1, len(pos_list)):hist = pos_list[:i]if i != len(pos_list) - 1:train_set.append((reviewerID, hist[::-1], pos_list[i], 1, len(hist[::-1]),rating_list[i]))for negi in range(negsample):train_set.append((reviewerID, hist[::-1], neg_list[i*negsample+negi], 0,len(hist[::-1])))else:test_set.append((reviewerID, hist[::-1], pos_list[i],1,len(hist[::-1]),rating_list[i]))random.shuffle(train_set)random.shuffle(test_set)print(len(train_set[0]),len(test_set[0]))return train_set,test_setdef gen_data_set_sdm(data, seq_short_len=5, seq_prefer_len=50):data.sort_values("timestamp", inplace=True)train_set = []test_set = []for reviewerID, hist in tqdm(data.groupby('user_id')):pos_list = hist['movie_id'].tolist()genres_list = hist['genres'].tolist()rating_list = hist['rating'].tolist()for i in range(1, len(pos_list)):hist = pos_list[:i]genres_hist = genres_list[:i]if i <= seq_short_len and i != len(pos_list) - 1:train_set.append((reviewerID, hist[::-1], [0]*seq_prefer_len, pos_list[i], 1, len(hist[::-1]), 0,rating_list[i], genres_hist[::-1], [0]*seq_prefer_len))elif i != len(pos_list) - 1:train_set.append((reviewerID, hist[::-1][:seq_short_len], hist[::-1][seq_short_len:], pos_list[i], 1, seq_short_len,len(hist[::-1])-seq_short_len, rating_list[i], genres_hist[::-1][:seq_short_len], genres_hist[::-1][seq_short_len:]))elif i <= seq_short_len and i == len(pos_list) - 1:test_set.append((reviewerID, hist[::-1], [0] * seq_prefer_len, pos_list[i], 1, len(hist[::-1]), 0,rating_list[i], genres_hist[::-1], [0]*seq_prefer_len))else:test_set.append((reviewerID, hist[::-1][:seq_short_len], hist[::-1][seq_short_len:], pos_list[i], 1, seq_short_len,len(hist[::-1])-seq_short_len, rating_list[i], genres_hist[::-1][:seq_short_len], genres_hist[::-1][seq_short_len:]))random.shuffle(train_set)random.shuffle(test_set)print(len(train_set[0]), len(test_set[0]))return train_set, test_setdef gen_model_input(train_set,user_profile,seq_max_len):train_uid = np.array([line[0] for line in train_set])train_seq = [line[1] for line in train_set]train_iid = np.array([line[2] for line in train_set])train_label = np.array([line[3] for line in train_set])train_hist_len = np.array([line[4] for line in train_set])train_seq_pad = pad_sequences(train_seq, maxlen=seq_max_len, padding='post', truncating='post', value=0)train_model_input = {"user_id": train_uid, "movie_id": train_iid, "hist_movie_id": train_seq_pad,"hist_len": train_hist_len}for key in ["gender", "age", "occupation", "zip"]:train_model_input[key] = user_profile.loc[train_model_input['user_id']][key].valuesreturn train_model_input, train_labeldef gen_model_input_sdm(train_set, user_profile, seq_short_len, seq_prefer_len):train_uid = np.array([line[0] for line in train_set])short_train_seq = [line[1] for line in train_set]prefer_train_seq = [line[2] for line in train_set]train_iid = np.array([line[3] for line in train_set])train_label = np.array([line[4] for line in train_set])train_short_len = np.array([line[5] for line in train_set])train_prefer_len = np.array([line[6] for line in train_set])short_train_seq_genres = np.array([line[8] for line in train_set])prefer_train_seq_genres = np.array([line[9] for line in train_set])train_short_item_pad = pad_sequences(short_train_seq, maxlen=seq_short_len, padding='post', truncating='post',value=0)train_prefer_item_pad = pad_sequences(prefer_train_seq, maxlen=seq_prefer_len, padding='post', truncating='post',value=0)train_short_genres_pad = pad_sequences(short_train_seq_genres, maxlen=seq_short_len, padding='post', truncating='post',value=0)train_prefer_genres_pad = pad_sequences(prefer_train_seq_genres, maxlen=seq_prefer_len, padding='post', truncating='post',value=0)train_model_input = {"user_id": train_uid, "movie_id": train_iid, "short_movie_id": train_short_item_pad,"prefer_movie_id": train_prefer_item_pad, "prefer_sess_length": train_prefer_len, "short_sess_length":train_short_len, 'short_genres': train_short_genres_pad, 'prefer_genres': train_prefer_genres_pad}for key in ["gender", "age", "occupation", "zip"]:train_model_input[key] = user_profile.loc[train_model_input['user_id']][key].valuesreturn train_model_input, train_label

给予DeepMatch框架进行召回实战相关推荐

  1. python基础实例 韦玮 pdf_精通Python网络爬虫 核心技术、框架与项目实战 作者:韦玮PDF...

    文件目录: 书本介绍: 书名 精通Python网络爬虫:核心技术.框架与项目实战 作者 韦玮著 出版社 机械工业出版社 出版日期 2017 内容简介 本书从系统化的视角,为那些想学习Python网络爬 ...

  2. 《精通Python网络爬虫:核心技术、框架与项目实战》——1.3 网络爬虫的组成...

    本节书摘来自华章出版社<精通Python网络爬虫:核心技术.框架与项目实战>一书中的第1章,第1.3节,作者 韦 玮,更多章节内容可以访问云栖社区"华章计算机"公众号查 ...

  3. 学习推荐《精通Python网络爬虫:核心技术、框架与项目实战》中文PDF+源代码

    随着大数据时代的到来,我们经常需要在海量数据的互联网环境中搜集一些特定的数据并对其进行分析,我们可以使用网络爬虫对这些特定的数据进行爬取,并对一些无关的数据进行过滤,将目标数据筛选出来.对特定的数据进 ...

  4. MIND多兴趣召回实战(一)

    这周又是居家办公,难得有时间可以好好儿梳理下之前的工作.今天想和大家聊聊我们在MIND多兴趣召回上面的一些实战经验. 背景 目前,深度学习U2I推荐召回模型已经在业内得到了广泛的应用与研究.大多数U2 ...

  5. Android网络框架Volley项目实战-刘桂林-专题视频课程

    Android网络框架Volley项目实战-5257人已学习 课程介绍         使用Google 2013 I/O大会上发布的Volley请求框架做几个实战项目,归属地查询,QQ测试吉凶,天气 ...

  6. 韦玮python视频教程下载_[课程学习]精通Python网络爬虫核心技术框架与项目实战韦玮PDF附源码 rar文件[101.68MB]-码姐姐下载...

    只需2积分精通精通Python网络爬虫核心技术.框架与项目实战,韦玮.pdf 大小:101.66MB | 2020-05-14 19:06:39 韦玮老师<精通python网络爬虫>源代码 ...

  7. 精通Python网络爬虫_核心技术框架与项目实战_韦玮.pdf

    精通Python网络爬虫_核心技术框架与项目实战_韦玮 编辑推荐 从技术.工具.实战3个维度讲透Python网络爬虫各项核心技术和主流框架,深度讲解网络爬虫的抓取技术与反爬攻关技巧 内容简介 随着大数 ...

  8. 【笔记-node】《Egg.js框架入门与实战》、《用 React+React Hook+Egg 造轮子 全栈开发旅游电商应用》

    20210226-20210227:<Egg.js框架入门与实战> 课程地址:https://www.imooc.com/learn/1185 第一章 课程导学 01-01 课程介绍 一. ...

  9. java webmagic_Java爬虫框架之WebMagic实战

    一.介绍 WebMagic是一个简单灵活的Java爬虫框架.基于WebMagic,你可以快速开发出一个高效.易维护的爬虫. 二.如何学习 1.查看官网 2.跑通hello world示例(具体可以参考 ...

最新文章

  1. MapReduce Java API实例-统计平均成绩
  2. JavaScript实现页面滚动到div区域div以动画方式出现
  3. CM: UPDATE_PAYLOAD_FROM_ADDINSCH
  4. 20. 有效的括号 golang(2)
  5. Spring 集成 mybatisPlus
  6. linux lamp实验报告,新手学Linux--构建lamp
  7. 3d 根据弧长算角度_3D立体画,让你身临其境
  8. [Node.js] BDD和Mocha框架
  9. C++调用tensorflow训练好的SSD物体检测模型-opencv3.4.3
  10. 网管日志-06.07.13
  11. 【Uva 1633】Dyslexic Gollum
  12. 计算机网络实验——华为ensp安装和初步使用教程
  13. 3dmax:3dmax三维VR渲染设置(VR间接照明GI栏、【VR间接(全局)照明】发光贴图、光子贴图、BF算法、灯光贴图、灯光缓存)之详细攻略(切记收藏!)
  14. 天涯论坛--只看楼主
  15. WinSCP下载安装及使用
  16. P9065 [yLOI2023] 云梦谣 题解
  17. TOJ 1320.Billiard
  18. RabbitMQ和fegin补充
  19. getaddrinfo EAI_AGAIN xxx.com
  20. 梅科尔工作室-李舒婷-鸿蒙笔记4

热门文章

  1. 手写简易VueRouter
  2. Unity中的热更新的基础知识,Xlua与ILRuntime基础知识
  3. Object-C,NSURL,统一资源定位器
  4. 【雕爷学编程】Arduino动手做(143)---ML8511紫外线传感器模块
  5. Android使用selector点击按钮文字变色
  6. Unity3D Client And C# Server Framework
  7. 浏览器访问www.baidu.com的过程
  8. luogu 3367
  9. 有两种常见的情况充斥着SEO优化市场,让排名得不到稳定
  10. 4wd智能小车c语言程序,智能盒子oj