Gender Classification of Japanese Authors David Edwards &
Gender Classification of Japanese Authors David Edwards & Cybelle Smith Gendered Speech in Japanese Gender of speaker may be overtly marked: Gender-specific first-person pronouns ,boku,boku, male; , ore, male; ,boku,watashi, female or neutral Question: Does gender have less-overt effects on Japanese texts as well? Can word choice, morphology, writing style indicate gender, even in noisy environments like fiction writing? Corpora Peace Corpus 29 personal essays by middle school students Topic: Peace 29 authors: 22 female 7 male Bookstudio Corpus 485 installments of online novels Genre: Fantasy 40 authors 20 female 20 male Also collected ~181 installments from authors of unknown gender (for future research) Our Baseline - The Boku Test
Corpus Male Accuracy Female Accuracy Overall Accuracy Peace .71 1.0 .93 Bookstudio .91 .43 .67 Classifiers Used Nave Bayes: Build conditional probabilities of features given gender Calculate probability of test data given a particular gender Select highest-probability gender SVM: Used the LIBSVM free classifying tool Find dividing hyperplane in num-feature dimensional space - Requires problem-specific parameters chosen via cross-validation
Apply hyperplane to test data Also attempted Logistic Regression Chasen: Segmenter and POS-tagger Stem Pronun Lemma -ciation
Part of Speech - - - - - - - - - - Features Stem Pron Lemma POS - KURAki kuraki adjective - independent KURAi Features
Kanji (Chinese character) Hiragana (phonetic) Katakana (phonetic, like italics) Single-feature performance on Naive-Bayes: Feature Indic Stem Lem Pron POS Quot WS SPDWS1 SPDWS2 Male Accuracy .29 .67 .68 .70 .80 .23 .66 .49 .87 Female Accuracy .51 .77
.78 .74 .45 .33 .85 .81 .68 Overall Accuracy .72 .73 .72 .63 .28 .76 .66 .77 .40 Multi-feature performance on Naive-Bayes: Trial Stem Lem Pron POS Quot WS SPD SPD Male Female Overall WS1 WS2 Acc. Acc. Acc.
1 X 2 X 3 X 4 X 5 X .63 .73 .68 X .81 .73 .77 X
.70 .76 .73 .68 .76 .72 X .68 .78 .73 X X 6 X X X X 7
X X X X X X X X .70 .70 .70 X X X .70 .73 .71
SVM Performance Optimizations: Scaling counts to avoid swamping low-frequency features Selecting optimal error rate and kernel parameters Accuracy Features No Scaling Scaling Cross Validation (Training Set) Cross Validation (Test Set) All features (except quotations) 50.6% 48.5% 79.7% 50.0% Part of Speech 50.9%
53.0% 68.0% 47.3% Wordshape 50.6% 63.3% 75.2% 50.6% 64% 77.8% 51.8% Pronunciation 50.6% Conclusion Without considering gendered pronouns, we achieved similar performance Most-indicative feature: wordshape (use of kanji vs. hiragana vs. katakana etc.), especially where multiple options exist Point of interest: male and female Japanese authors differ not just in the words they use, but how they choose to write those words
PI: Gerd Kortemeyer Institution: Michigan State University Title: Investigation of a Model for Online Resource Creation and Sharing in Educational Settings Research on effective mechanisms for sharing online educational resources (content pages, homework and exam problems, etc) across disciplinary and...
FIGURE C-5: Viewing two object guides. Modifying Objects with the Direct Selection Tool. ... FIGURE C-9: Red rectangle sent to the back of the stacking order. FIGURE C-10: Moving the blue oval forward in the stacking order. Working with the...
Please view in full-screen presentation mode. ... All equity release advice must be provided by a qualified adviser. Just as important is the level of experience an adviser has in arranging a large variety of plans. ... Visit our referrals...
EE 445S Real-Time Digital Signal Processing Lab Spring 2017 Lab #2 Generating a Sine Wave Using the Hardware & Software Tools for the TI TMS320C6748 DSP (Continued) Debarati Kundu and Sam Kanawati (with the help of Mr. Eric Wilbur, TI)...
CoastWatch Node Manager's Meeting, 11-13 October 2005, Pacific Grove, CA West Coast CoastWatch Node Report 2005 West Coast CoastWatch Node Outline West Coast Regional Node History WCRN Data Access Ocean Watch LAS Live Access Server (LAS) New CoastWatch Browser New...