Welcome to the Utopia Forums! Register a new account
The current time is Fri Jun 27 15:41:07 2025
Utopia Talk / Politics / UP Search Functionality
earthpig
GTFO HOer | Thu Mar 11 19:40:33 so, playing around. windows users will need to google search "wget for windows" and "grep for windows". wget -r www.utopiaforums.com this will begin the process of downloading everything found on the website www.utopiaforums.com to a folder called www.utopiaforums.com. you can play around more if you only want UP or only UGT or only sports. may take a while. im doing that now, and playing around with what is already downloaded. then, to search every thread ever for a term: grep "a term" * "a term" can be a date, or a poster's name, or a phrase. if you only want to search for recent threads: make a folder called 'recent'. sort the files by name in your favorite file manager. this will sort them by forum, and then by thread number. counting starts at zero, so the larger numbers are the more recent ones. copy the most recent 100 or 200 into the folder 'recent'. then, grep away in that folder. example: [chris: ~/up/www.utopiaforums.com/recent]$ grep "faggot" * boardthread?id=politics&thread=31361&showdeleted=true:Why this faggot hasn't been permabanned a long time ago is baffling to me. Who else would do this? it's cthulhu/asshole uper. boardthread?id=politics&thread=31361&time=1268329717661:Why this faggot hasn't been permabanned a long time ago is baffling to me. Who else would do this? it's cthulhu/asshole uper. boardthread?id=politics&thread=591&showdeleted=true:shit, faggot boardthread?id=politics&thread=591&showdeleted=true:What a fucking faggot Poison is. boardthread?id=politics&thread=7079&showdeleted=true:licker is a faggot [chris: ~/up/www.utopiaforums.com/recent]$ then you can copy/paste the "boardthread?..." stuff into yer web browser to see that thread. |
earthpig
GTFO HOer | Thu Mar 11 19:55:50 grep -li "term" * will merely point you to threads that contain that term. grep -li "hot rod" *politics* will search all of UP for mentions of hot rod, hOt RoD and any other case-insensitive variation. downloading the full archive just finished, btw. 1332 files, 40mb. |
Camaban
Moderator | Thu Mar 11 20:05:15 Wouldn't that only show shit that's currently linked and active? |
earthpig
GTFO HOer | Thu Mar 11 20:29:28 actually, yes, that appears correct. thread 1 is downloaded (this: http://www.utopiaforums.com/boardthread?id=politics&thread=1 ) but then several thousand are skipped. a few from the 8ks, then up to 15k. and for subsequent updates (after the first download), it is probably best to run wget like this so TC doesn't get pissed: wget -rN www.utopiaforums.com so it only dloads the file over again if it is newer than the version on your computer, or if it doesn't exist on your computer. |
earthpig
GTFO HOer | Thu Mar 11 20:35:51 hrm. the -N flag doesn't seem to be working " Last-modified header missing -- time-stamps turned off. " over and over again. well, barring that, and to keep TC's bandwidth costs down: i suppose once a week or so i could download the archive, gzip it, and post the .zip file on a server that TC isn't paying for. folks could then download it from the 3rd party website. rapidshare or dropbox or whatever. (i lost my dropbox account's "public sharing" functionality when masonux got popular and i was hogging all of their bandwidth on my free account. grrr.) |
earthpig
GTFO HOer | Thu Mar 11 20:41:31 ok. it would be ridiculously expensive for TC to pay for all of us wgetting the entire website all the time. ill do it once a week, and rapidshare it. a much lighter burden on TC. 2 things ill ask: -let me know if you download it, so i know someone is at least benefiting from my efforts. -click an ad when you do. |
earthpig
GTFO HOer | Thu Mar 11 20:41:42 http://rapidshare.com/files/362199846/11march2010.zip.html MD5: 3F2D34434758BB02411A83D1EBAD9809 |
Still Well
Member | Thu Mar 11 20:45:40 I don't feel like working tomorrow so maybe I'll put a search function on the sworddicks site. |
earthpig
GTFO HOer | Thu Mar 11 20:54:32 can you close/delete this thread, cama? i made another one that is more concise and won't result in a ridiculous spike in the use of TC's expensive bandwidth. |
Turtle Crawler
Admin | Thu Mar 11 21:45:10 I'm not so worried about bandwidth dude, utorrent says I did 1.28TB in the last 31 days. I could do an entire dump of the database if you really wanted it. it would be about 750mb. |
show deleted posts |
![]() |