In case anyone's curious, all my other projects, except for a few curriculum-related ones, have been put on hold until I get the metadata sorting done and all tagging tabs prepped. I've not written any fic since last year, and probably won't for a long while. Even my fic db is on pause - I've got WIPs to update wordcount for, but YG has pretty much taken over my life, lol.
However, I have some neat stats to report. As of this evening, it's up to 12.39% sorted and 0.99% tagged.
Available tabs:English: 386
Spanish: 8
Portuguese: 4
Italian: 83
German: 1
French: 1
Chinese: 1
Indonesian: 24
Arabic: 11
Persian: 4
Turkish: 11
Romanian: 2
Unknown: 44
Spam: 3
You'll notice some big languages (German, French, Chinese) have virtually no tabs so far, and Spanish and Portuguese still have very few, whereas Italian's got dozens and dozens. That's because I'm going in order of the cat_ids, and all the Italian categories were stacked at the beginning, followed by tons of English. I haven't yet gotten to the categories for any of the other languages. Indonesian, Arabic, and Turkish didn't have separate category structures for them, so they've been sprinkled throughout other categories, mainly English ones.
There are a lot more languages I've got heaps for (over 300 Dutch groups, for instance), but they're so few that I don't know how many tabs I'd get out of them by the end, so they are all shoved onto a single tab for each language for now, which won't be available until I'm done sorting.
The Unknown is, in case anyone's curious who doesn't know, a mix of groups which are pretending not to be spam (but which have obvious patterns in their descriptions), groups which have so little or confusing info that someone will have to actually look inside their messages to get an idea of what they were, and groups where the info was unclear as to what the primary language or languages for the group was/were, so they also need looking inside.
LanguagesAs far as total languages seen, besides English and the languages listed above as having tabs ready to tag, I've seen just around 85 so far (listed in the order I found the first group in that language, by broad region):
European - Esperanto, Albanian, Bosnian, Hungarian, Polish, Swedish, Greek, Danish, Croatian, Dutch, Estonian, Lithuanian, Norwegian, Finnish, Slovak, Czech, Slovenian, Latvian, Catalan, Icelandic, Welsh, Basque
Cyrillic (separated out because of the alphabet): Russian, Ukrainian, Bulgarian
Oceania - Māori
North American - Haitian Creole
African: Swahili, Afrikaans, Somali
Asian - Hindi/Urdu, Vietnamese, Filipino, Hebrew, Azerbaijani, Javanese, Tamil, Mongolian, Thai, Korean, Marathi, Japanese, Sundanese, Kurdish
Not counted in the above list were the
nine different languages from the Kuki-Chin language family - a branch of the Tibeto-Burman one. (Telling *those* apart - given most aren't in Google Translate - has been a real exercise in pattern recognition and detective work.)
Then there were the languages I have only one or two groups max for (when it hits three, then I give them their own tab, just in case):
European: Breton, Ido, Interlingua Romanica, Luxembourgish, Aromanian, Belarusian, Piedmontese, Middelsprake
Cyrillic: Macedonian, Tatar
Asian: Tausug, Tetun Dili, Malayalam, Kyrgyz, Armenian, Batak Toba, Meitei
African: Ukwuani, Kinyarwanda
Names for languages have generally been taken from Google Translate, or Wikipedia when that fails. If it turns out one of the language names is incorrect, it can be easily fixed once everything's into the database.
VolunteersA big call for volunteers won't go out until I have all tabs prepped and ready to go, but if you or someone you know is detail-oriented and likes the idea of helping out, limited numbers of volunteers can be used right now, as indeed a couple of them have been hard at work for months tagging tabs and fleshing out any bugs in the tagging process.
In addition to business, computers, and mixed tabs of a whole jumble of things (some of which include fandom groups interspersed throughout), there are already fandom-only tabs available in the following areas:
- a few specific pop groups (BSB, Britney Spears, and NSYNC)
- celebrities (mainly actors)
- anime & manga
- comics
- cartoons
- Disney
Currently I'm working on fashion models, and when finished with those, will move on to the humanities categories - which almost immediately start into books and authors. :D
If you're interested and haven't already joined, you can find the Discord server here:
https://discord.gg/UyJdffhw2b