Entry tags:
The end of Yahoo Groups - a few thoughts & stats
For the past four months I have been saving Yahoo Groups. It started with just me saving a few groups from favorite fandoms or ones I was tangentially interested in, and it sort of exploded, lol. By the end I was coordinating with not only the other members of the (Yahoo Groups) Fandom Rescue Project (aka Yahoo-Geddon), but also with the ArchiveTeam, a loose network of volunteers who try to save data in order to give it to the Internet Archive, best known for the Wayback Machine. (Please note, however: they are NOT actually affiliated with the Internet Archive! Too many people are confused on that point.)
My geek skills have also majorly improved. I've had at least one terminal window (more like five or ten, lol) open constantly for the entire time, and learned a slew of new commands to extract and manipulate data (such as extracting all the links from a set of Yahoo's data and pulling Yahoo group names out of that to find new and hidden groups to save). I've run more scripts in the past four months than in my entire life up to that point, to now where the thought of having to run a script doesn't faze me at all. (Python and Perl - and multiple very helpful volunteers from the ArchiveTeam - are to be thanked for their major part in making this project as successful as it was, lol.)
Now that the last ID has come in, and there's no more point in searching for or joining groups, the project shifts from acquiring data to processing it. I've got ~2 TB of data on my hdd, and just about 8 TB more coming in the future from other sources. Much of what's still coming isn't fandom - but enough is to make it worth getting it all so the fandom can be sorted out. (It's going to take months, oof. So much work. We have multiple hundred thousands of groups' data to manage.)
Which necessitates the purchase of three 12 TB drives. (One to hold all the zipped data, one to sort into, one to serve as local backup of the sorting, because I'm not trusting that level of work to distant 'net backups of other project members who live multiple states away at minimum.) Each one's $200, ouch. I was not expecting to need to spend that kind of money; I work a couple part-time jobs and live on the fairly low side of income, which means that'll be a chunk out of my savings.
So while I'm not going to ask outright, if it so happens that you wanted to help save Yahoo Groups and did not have the time to work on it, should you want to contribute some funds instead, I would definitely accept. And I wasn't the only one who's spending a lot for this project. I know several of the others sunk considerable money of their own into it along the way to enable us to get as much as we did. Comment or PM if you'd like to help out in that way. (We may have a more organized donation method at some point, but all I have for now is a personal Paypal under my real name.)
A few numbers and random other bits of info:
~2 TB of fandom data saved (that I know of, for now)
~200,000 confirmed fandom groups saved in some fashion
~2,000 Sims groups saved*
Languages for which I saw Tolkien-related groups:
English, French, Spanish, Italian, Portuguese, Esperanto, Lithuanian, Indonesian, Turkish, Catalan, Polish, Bosnian, Hungarian, Finnish
A few categories that made me laugh (yes, these really exist):
Sneezing
Traffic Signs
Music That Sucks
Also, there were a ton of Anti- categories (Anti-Jennifer Lopez, Anti-Hentai, Anti-Eminem, etc.) and quite a variety of pr0n categories (I won't give examples there, lol).
*The only reason I know the Sims number is because I was tracking those groups on Google spreadsheets in order to find all of them and get volunteers to join them. For other fandoms it's impossible to give any sort of number at this point (although I know there was a ton of LOTR, HP, Buffy, and Westlife, lol). Yahoo's categorization was terrible and a group name doesn't always give good clues as to whether it's fandom/non-fandom. Getting that sort of data will take a good deal of time and work.
My geek skills have also majorly improved. I've had at least one terminal window (more like five or ten, lol) open constantly for the entire time, and learned a slew of new commands to extract and manipulate data (such as extracting all the links from a set of Yahoo's data and pulling Yahoo group names out of that to find new and hidden groups to save). I've run more scripts in the past four months than in my entire life up to that point, to now where the thought of having to run a script doesn't faze me at all. (Python and Perl - and multiple very helpful volunteers from the ArchiveTeam - are to be thanked for their major part in making this project as successful as it was, lol.)
Now that the last ID has come in, and there's no more point in searching for or joining groups, the project shifts from acquiring data to processing it. I've got ~2 TB of data on my hdd, and just about 8 TB more coming in the future from other sources. Much of what's still coming isn't fandom - but enough is to make it worth getting it all so the fandom can be sorted out. (It's going to take months, oof. So much work. We have multiple hundred thousands of groups' data to manage.)
Which necessitates the purchase of three 12 TB drives. (One to hold all the zipped data, one to sort into, one to serve as local backup of the sorting, because I'm not trusting that level of work to distant 'net backups of other project members who live multiple states away at minimum.) Each one's $200, ouch. I was not expecting to need to spend that kind of money; I work a couple part-time jobs and live on the fairly low side of income, which means that'll be a chunk out of my savings.
So while I'm not going to ask outright, if it so happens that you wanted to help save Yahoo Groups and did not have the time to work on it, should you want to contribute some funds instead, I would definitely accept. And I wasn't the only one who's spending a lot for this project. I know several of the others sunk considerable money of their own into it along the way to enable us to get as much as we did. Comment or PM if you'd like to help out in that way. (We may have a more organized donation method at some point, but all I have for now is a personal Paypal under my real name.)
A few numbers and random other bits of info:
~2 TB of fandom data saved (that I know of, for now)
~200,000 confirmed fandom groups saved in some fashion
~2,000 Sims groups saved*
Languages for which I saw Tolkien-related groups:
English, French, Spanish, Italian, Portuguese, Esperanto, Lithuanian, Indonesian, Turkish, Catalan, Polish, Bosnian, Hungarian, Finnish
A few categories that made me laugh (yes, these really exist):
Sneezing
Traffic Signs
Music That Sucks
Also, there were a ton of Anti- categories (Anti-Jennifer Lopez, Anti-Hentai, Anti-Eminem, etc.) and quite a variety of pr0n categories (I won't give examples there, lol).
*The only reason I know the Sims number is because I was tracking those groups on Google spreadsheets in order to find all of them and get volunteers to join them. For other fandoms it's impossible to give any sort of number at this point (although I know there was a ton of LOTR, HP, Buffy, and Westlife, lol). Yahoo's categorization was terrible and a group name doesn't always give good clues as to whether it's fandom/non-fandom. Getting that sort of data will take a good deal of time and work.