Tagging has begun (for real)!
May. 29th, 2022 09:16 amAt last, the database of Yahoo Groups metadata has reached the point of readiness so that tagging can begin in earnest - and done properly this time, unlike the attempt last summer (which I have kept the info for, so the efforts were not in vain).
Big pluses this time:
- Tabs are organized by one or more cat_id numbers, each of which is unique to the combination of category path, category id, and category name. This means that, on the whole, the groups on a tab should be roughly similar in theme/content, and in some cases, tabs can be limited to a specific fandom (generally super popular fandoms of the early 2000s such as LOTR, HP, Buffy, Sailor Moon, or Backstreet Boys).
- Tabs are separated by language, so the average volunteer doesn't have to deal with identifying languages at all, nor do they have to tag groups in a language other than English if they don't want to. (Of course, that means we are definitely eager for volunteers who ARE able and willing to tag groups in other languages!)
- Nonfandom tagging is done alongside the fandom tagging by the same volunteer, so there's no second pass that has to be done later. (There's a preset list of nonfandom categories to copy/paste from, to make it simple for taggers.)
- Each tab only needs to be looked at twice (once for initial tagging, once for checking), but checkers have a slightly different set of tasks, and should be able to breeze through most tabs more quickly than the initial tagging.
If you're interested in helping, here are the relevant links:
Tagging guidelines: https://docs.google.com/document/d/1AWFSmXLH-KsVU7N1EGkmbrLyv1N_fYoRlWLCEWLxtX4/edit?usp=sharing
Category list with cat_id and groups count: https://mega.nz/file/EZ9xCY4b#N8l9_LTJ-mV4KsMRO0DT4J--ws1fackYRcLZHPJ370o
Discord server (Save Yahoo Groups): https://discord.gg/fqsNqdpF7r
You do NOT have to be on the server, if you don't want to do Discord. It is helpful, as I update the last processed cat_id so you know which ones are already on tabs and can be requested, but it's not necessary.
Especially needed:
- anyone who can read a language other than English (the first 1000+ cat_ids are all Italian category paths, for instance, and there are tons of groups later on in Spanish, French, German, Portuguese, Chinese, Turkish, Indonesian, Arabic, some Hebrew - and we will particularly need to find someone who can read Persian written with the Latin alphabet)
- anyone with specialist knowledge in particular areas, whether it's a specific fandom or general area of fandom you know well, or a nonfandom area like computers, biology, or various cultures
- anyone willing to download and import mbox files in order to identify language and/or fandom/category for groups on the "unknown" tabs (groups where the metadata is not sufficient for tagging); we've got a visual tutorial for a lightweight free software program, so it's not hard!
Big pluses this time:
- Tabs are organized by one or more cat_id numbers, each of which is unique to the combination of category path, category id, and category name. This means that, on the whole, the groups on a tab should be roughly similar in theme/content, and in some cases, tabs can be limited to a specific fandom (generally super popular fandoms of the early 2000s such as LOTR, HP, Buffy, Sailor Moon, or Backstreet Boys).
- Tabs are separated by language, so the average volunteer doesn't have to deal with identifying languages at all, nor do they have to tag groups in a language other than English if they don't want to. (Of course, that means we are definitely eager for volunteers who ARE able and willing to tag groups in other languages!)
- Nonfandom tagging is done alongside the fandom tagging by the same volunteer, so there's no second pass that has to be done later. (There's a preset list of nonfandom categories to copy/paste from, to make it simple for taggers.)
- Each tab only needs to be looked at twice (once for initial tagging, once for checking), but checkers have a slightly different set of tasks, and should be able to breeze through most tabs more quickly than the initial tagging.
If you're interested in helping, here are the relevant links:
Tagging guidelines: https://docs.google.com/document/d/1AWFSmXLH-KsVU7N1EGkmbrLyv1N_fYoRlWLCEWLxtX4/edit?usp=sharing
Category list with cat_id and groups count: https://mega.nz/file/EZ9xCY4b#N8l9_LTJ-mV4KsMRO0DT4J--ws1fackYRcLZHPJ370o
Discord server (Save Yahoo Groups): https://discord.gg/fqsNqdpF7r
You do NOT have to be on the server, if you don't want to do Discord. It is helpful, as I update the last processed cat_id so you know which ones are already on tabs and can be requested, but it's not necessary.
Especially needed:
- anyone who can read a language other than English (the first 1000+ cat_ids are all Italian category paths, for instance, and there are tons of groups later on in Spanish, French, German, Portuguese, Chinese, Turkish, Indonesian, Arabic, some Hebrew - and we will particularly need to find someone who can read Persian written with the Latin alphabet)
- anyone with specialist knowledge in particular areas, whether it's a specific fandom or general area of fandom you know well, or a nonfandom area like computers, biology, or various cultures
- anyone willing to download and import mbox files in order to identify language and/or fandom/category for groups on the "unknown" tabs (groups where the metadata is not sufficient for tagging); we've got a visual tutorial for a lightweight free software program, so it's not hard!