doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
In case anyone's curious, all my other projects, except for a few curriculum-related ones, have been put on hold until I get the metadata sorting done and all tagging tabs prepped. I've not written any fic since last year, and probably won't for a long while. Even my fic db is on pause - I've got WIPs to update wordcount for, but YG has pretty much taken over my life, lol.

However, I have some neat stats to report. As of this evening, it's up to 12.39% sorted and 0.99% tagged.

Available tabs:

English: 386
Spanish: 8
Portuguese: 4
Italian: 83
German: 1
French: 1
Chinese: 1
Indonesian: 24
Arabic: 11
Persian: 4
Turkish: 11
Romanian: 2
Unknown: 44
Spam: 3

You'll notice some big languages (German, French, Chinese) have virtually no tabs so far, and Spanish and Portuguese still have very few, whereas Italian's got dozens and dozens. That's because I'm going in order of the cat_ids, and all the Italian categories were stacked at the beginning, followed by tons of English. I haven't yet gotten to the categories for any of the other languages. Indonesian, Arabic, and Turkish didn't have separate category structures for them, so they've been sprinkled throughout other categories, mainly English ones.

There are a lot more languages I've got heaps for (over 300 Dutch groups, for instance), but they're so few that I don't know how many tabs I'd get out of them by the end, so they are all shoved onto a single tab for each language for now, which won't be available until I'm done sorting.

The Unknown is, in case anyone's curious who doesn't know, a mix of groups which are pretending not to be spam (but which have obvious patterns in their descriptions), groups which have so little or confusing info that someone will have to actually look inside their messages to get an idea of what they were, and groups where the info was unclear as to what the primary language or languages for the group was/were, so they also need looking inside.


Languages

As far as total languages seen, besides English and the languages listed above as having tabs ready to tag, I've seen just around 85 so far (listed in the order I found the first group in that language, by broad region):

European - Esperanto, Albanian, Bosnian, Hungarian, Polish, Swedish, Greek, Danish, Croatian, Dutch, Estonian, Lithuanian, Norwegian, Finnish, Slovak, Czech, Slovenian, Latvian, Catalan, Icelandic, Welsh, Basque

Cyrillic (separated out because of the alphabet): Russian, Ukrainian, Bulgarian

Oceania - Māori

North American - Haitian Creole

African: Swahili, Afrikaans, Somali

Asian - Hindi/Urdu, Vietnamese, Filipino, Hebrew, Azerbaijani, Javanese, Tamil, Mongolian, Thai, Korean, Marathi, Japanese, Sundanese, Kurdish

Not counted in the above list were the nine different languages from the Kuki-Chin language family - a branch of the Tibeto-Burman one. (Telling *those* apart - given most aren't in Google Translate - has been a real exercise in pattern recognition and detective work.)


Then there were the languages I have only one or two groups max for (when it hits three, then I give them their own tab, just in case):

European: Breton, Ido, Interlingua Romanica, Luxembourgish, Aromanian, Belarusian, Piedmontese, Middelsprake

Cyrillic: Macedonian, Tatar

Asian: Tausug, Tetun Dili, Malayalam, Kyrgyz, Armenian, Batak Toba, Meitei

African: Ukwuani, Kinyarwanda


Names for languages have generally been taken from Google Translate, or Wikipedia when that fails. If it turns out one of the language names is incorrect, it can be easily fixed once everything's into the database.


Volunteers

A big call for volunteers won't go out until I have all tabs prepped and ready to go, but if you or someone you know is detail-oriented and likes the idea of helping out, limited numbers of volunteers can be used right now, as indeed a couple of them have been hard at work for months tagging tabs and fleshing out any bugs in the tagging process.

In addition to business, computers, and mixed tabs of a whole jumble of things (some of which include fandom groups interspersed throughout), there are already fandom-only tabs available in the following areas:

- a few specific pop groups (BSB, Britney Spears, and NSYNC)
- celebrities (mainly actors)
- anime & manga
- comics
- cartoons
- Disney

Currently I'm working on fashion models, and when finished with those, will move on to the humanities categories - which almost immediately start into books and authors. :D

If you're interested and haven't already joined, you can find the Discord server here: https://discord.gg/UyJdffhw2b
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org

Profile

doranwen: female nerds, rare and precious (Default)
Doranwen

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223 24 25 26
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 1st, 2025 04:06 pm
Powered by Dreamwidth Studios