Development of an automatic attitude recognition system: a multimodal analysis of video blogs

Noor Alhusna Madzlan

QR Code Link :
Type :	Thesis
Subject :	TK Electrical engineering. Electronics Nuclear engineering
Main Author :	Noor Alhusna Madzlan
Title :	Development of an automatic attitude recognition system: a multimodal analysis of video blogs
Hits :	1667

Place of Production :	Tanjong Malim
Publisher :	Fakulti Bahasa dan Komunikasi
Year of Publication :	2017
Corporate Name :	Universiti Pendidikan Sultan Idris
PDF Guest :	Click to view PDF file
PDF Full Text :	You have no permission to view this item.

Abstract : Universiti Pendidikan Sultan Idris

Communicative content in human communication involves expressivity of socio-affective states. Research in Linguistics, Social Signal Processing and Affective Computing in par ticular, highlights the importance of affect, emotion and attitudes as sources of information for communicative content. Attitudes, considered as socio-affective states of speakers, are conveyed through a multitude of signals during communication. Understanding the expres sion of attitudes of speakers is essential for establishing successful communication. Taking the empirical approach to studying attitude expressions, the main objective of this research is to contribute to the development of an automatic attitude classification system through a fusion of multimodal signals expressed by speakers in video biogs. The present study de scribes a new communicative genre of self-expression through social media: video blogging, which provides opportunities for interlocutors to disseminate information through a myriad of multi modal characteristics. This study describes main features of this novel communica tion medium and focuses attention to its possible exploitation as a rich source of information for human communication. The dissertation describes manual annotation of attitude expres sions from the vlog corpus, multimodal feature analysis and processes for development of an automatic attitude annotation system. An ontology of attitude annotation scheme for speech in video biogs is elaborated and five attitude labels are derived. Prosodic and visual fea ture extraction procedures are explained in detail. Discussion on processes of developing an automatic attitude classification model includes analysis of automatic prediction of attitude labels using prosodic and visual features through machine-learning methods. This study also elaborates detailed analysis of individual feature contributions and their predictive power to the classification task

References

[l] Jens Allwood. A framework for studying human multimodal communication. Coverbal Syn

chrony in Human-Machine Interaction, page 17, 2013.

[2] Antonio Damasio and Raymond J Dolan. The feeling of what happens.

Nature,

401(6756):847-847, 1999.

[3] Antonio Damasio. Descartes' error: Emotion, reason and the human brain. Random House,

2008.

[4] James A Russell and Doreen Ridgeway. Dimensions underlying children's emotion concepts.

Developmental Psychology, 19(6):795, 1983.

[5] M. Wetherell. Affect and Emotion: A New Social Science Understanding. SAGE Publications,

2012.

[6] Mark P Zanna and John K Rempel. Attitudes: A new look at an old concept. 1988.

[7] Eric Shouse. Feeling, emotion, affect. Mic journal, 8(6):26, 2005.

[8] Stuart Oskamp and P Wesley Schultz. Attitudes and opinions. Psychology Press, 1977.

[9] Martin Fishbein and leek Ajzen. Attitudes and opinions. Annual review of

psychology,

23(1):487-544, I 972.

[ IO] Veronique Auberge. A gestalt morphology of prosody directed by functions: the example

ofa step by step model developed at icp. In Speech Prosody 2002, International Conference, 2002.

[11) Jens Allwood. Multimodal corpora. Corpus linguistics: An international handbook, l

:207-

224, 2008

[12) Alessandro Vinciarelli and Gelareh Mohammadi. Towards a technology of nonverbal

com munication: Vocal behavior in social and affective phenomena. Technical report.

igi-global,

2010

[13] Mark Knapp, Judith Hall, and Terrence Horgan. Nonverbal communication in human interac

tion. Cengage Learning, 2013.

[14] Virginia P Richmond, James C McCroskey, and Steven K Payne. N0/1\'erbaI beIiav1·or 1·??

1·11ter- personal relations. Prentice Hall Englewood Cliffs, NJ, 1991.

[15] Paul Ekman. Facial expression and emotion. American psychologist, 48(4):384, 1993.

[16] Alessandro Vinciarelli and Fabio Valente. Social signal processing: Understanding

nonverbal communication in social interactions. In Proceedings of Measuring Behavior 2010,

Eindhoven (The Netherlands), number EPFL-CONF-163182, 20 I 0.

[17] Louis-Philippe Morency, Rada Mihalcea, and Paya) Doshi. Towards multimodal

sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th international

confer ence on multimodal inte1faces, pages 169-176. ACM, 2011.

[18] Veronica Rosas, Rada Mihalcea, and L Morency. Multimodal sentiment analysis of spanish

online videos. 2013.

[19] Christer Gobi, Ailbhe N1, et al. The role of voice quality in communicating emotion,

mood and attitude. Speech communication, 40(1):189-212, 2003.

[20] Ginevra Castellano, Santiago D Villalba, and Antonio Camurri. Recognising human emotions

from body movement and gesture dynamics. In Affective computing and intelligent interaction, pages

71-82. Springer, 2007.

[21] Nadia Bianchi-Berthouze, Paul Cairns, Anna Cox, Charlene Jennett, and Whan Woong Kim.

On posture as a modality for expressing and recognizing emotions. In Emotion and HCI

workshop at BCS HCI London, 2006.

[22] Dang-Khoa Mac, Veronique Auberge, Albert Rilliard, and Eric Castelli. Cross-cultural

per ception of vietnamese audio-visual prosodic attitudes. In Speech Prosody, 2010.

[23] Joao Antonio de Moraes, Albert Rilliard, Bruno Alberto de Oliveira Mota, and Takaaki Shochi.

Multimodal perception and production of attitudinal meaning in brazilian portuguese. In Proc. of

Speech Pmsody, 20 I 0.

[24] Yann Morlee, Grard Bailly, and Vronique Auberg. Generating prosodic attitudes in

french: Data. model and evaluation. Speech Communication. 33(4):357-371. 2001.

[25] Bonn ie A Nardi, Diane J Schia no' and Michelle Gumbrecht. B l ogg·mg as soc·ial act1· v1·

ty. or. would you let 900 million people read your ct· ? I p 1·

rnry. n rocee< mgs of the 2004 ACM co1if<'re11ce

on Computer supported cooperative work, pages 222-231. ACM. 2004.

[26] DianeJ Schiano, Bonnie A Nardi, Michelle Gumbrecht, and Luke Swartz. Blogging by the

rest of us. In CH/'04 extended abstracts on Human factors in computing systems, pages J 143-

1146. ACM, 2004.

[27] Lilia Efimova and Aldo De Moor. Beyond personal webpublishing: An exploratory study of

conversational blogging practices. In System Sciences, 2005. H/CSS'05. Proceedings of the

38th Annual Hawaii International Conference 011, pages 107a-107a. IEEE, 2005.

[28] Nicole B Ellison, Charles Steinfield, and Cliff Lampe. The benefits of facebook friends:

social capital and college students use of online social network sites. Journal of

Computer-Mediated Communication, 12(4):1143-1168, 2007.

[29] Alessandro Acquisti and Ralph Gross. Imagined communities: Awareness, information shar

ing, and privacy on the facebook. In Privacy enhancing technologies, pages 36-58. Springer, 2006.

[30] Yusuke Yanbe, Adam Jatowt, Satoshi Nakamura, and Katsumi Tanaka. Can social bookmark ing

enhance search in the web? In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital

libraries, pages 107-116. ACM, 2007.

[31] Thorsten Joachims. Optimizing search engines using clickthrough data. In

Proceedings of the eighth ACM SJGKDD international conference on Knowledge discovery and data

mining,

pages 133-142. ACM, 2002.

[32] Xintian Yang, Amol Ghoting, Yiye Ruan, and Srinivasan Parthasarathy. A framework for summarizing

and analyzing twitter feeds. In Proceedings of the J 8th ACM SIGKDD international

conference on Knowledge discovery and data mining, pages 370-378. ACM, 2012.

II H· M nd Xiaoiun Zeng Twitter mood predicts the stock market. Journal

[33] Johan Bo en, uma ao, a J •

of Computational Science, 2( I): 1-8,20 II.

G· P You are known by how you vlog: Per-

[34] loan-Isaac Biel, Oya Aran, and Daniel atrca- erez.

.' b'l b havior in youtube. In ICWSM. 2011.

sonality nnpressions and nonver

[35] Oya Aran, Joan-Isaac Biel, Daniel Gatica-Perez,et.al You are known b hiP y ow you v og: ersonality

impressions and nonverbal behavior in youtube In Proceedin 1AAAII . . I gs 0 ntemationa

Conference on Weblogs and Social Media, number EPFL-CONF-165900, 201 I.

[36] Martin W61Imer, Felix Weninger, Tobias Knaup, Bjorn Schuller, Congkai Sun, Kenji Sagae,

and L Morency. Youtube movie reviews: In, cross, and open-domain sentiment analysis in an

audiovisual context. 2013.

[37] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,

R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot,

and E. Duchesnay. Scikit-Iearn: Machine learning in Python. Journal ofMachine Learning

Research, 12:2825-2830, 20 II.

[38] Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. Supervised machine learning: A review of

classification techniques, 2007.

[39] Naresh K Malhotra. Attitude and affect: new frontiers of research in the 21 st century. Journal

ofBusiness Research, 58(4):477-482, 2005.

[40] I. Ajzen. Attitudes, Personality, and Behavior. Mapping social psychology. McGraw-Hili

Education, 2005.

[41] John A Bargh and Tanya L Chartrand. The unbearable automaticity of being. American psychologist,

54(7):462, 1999.

[42] James Agarwal and Naresh K Malhotra. An integrated model of attitude and affect: Theoretical

foundation and an empirical investigation. Journal ofBusiness research, 58(4):483-493, 2005.

[43] Yan Lu, Veronique Auberge, Albert Rilliard, et al. Do you hear my attitude? prosodic perception

of social affects in mandarin. Proceedings of Speech Prosody 2012, pages 685-688,

2012.

[44] Jens Allwood, Stefano Lanzini, and Elisabeth Ahlsen. Contributions of different modalities to

the attribution of affective-epistemic states. In Proceedings from the l st European Symposium

on Multimodal Communication University of Malta, pages 1-6.

[45] Massimo Chindamo, Julian Allwood, and Elisabeth Ahlsen. Some suggestions for the study of

In P"I"'(IL',", Security Risk and Trtlsl (PA55AT). 2012 lnternatirmal

stance in communication,

Conference on and 2012 International Con,Fje. rn• ece 011 Soc'ial Computing (Socia/Colli), pages

617-622. IEEE, 2012.

[46] Patrizia Paggio, Jens Allwood, Elisabeth Ahlsen a d K'" J ki

, n nstuna 0 men. The nomco multimodal

nordic resource-goals and characteristics. 20 I O.

[47] Peter Juel Henrichsen and Jens Allwood Predicting the attit d fI . di I b d . u e ow 10 ia ogue ase on

multi-modal speech cues. NEALT PROCEEDINGS SERIES, 2012.

[48] Yann Morlec, Gerard Bailly, and Veronique Auberge. Generating the prosody of attitudes. In

Intonation: Theory, Models and Applications, 1997.

[49] Jean-Marc Blanc and Peter Ford Dominey. Identification of prosodic attitudes by a temporal

recurrent network. Cognitive Brain Research, 17(3):693-699,2003.

[50] Albert Rilliard, Jean-Claude Martin, Veronique Auberge, Takaaki Shochi, et al. Perception of

french audio-visual prosodic attitudes. Speech Prosody. Campinas, Brasil, 2008.

[51] Plinio Barbosa and Gerard Bailly. Characterisation of rhythmic patterns for text-to-speech

synthesis. Speech Communication, 15(1-2):127-137, 1994.

[52] Vasiliki Orgeta and Louise H Phillips. Effects of age and emotional intensity on the recognition

of facial emotion. Experimental aging research, 34( I ):63-79,2007.

[53] Alberto Di Domenico, Rocco Palumbo, Nicola Mammarella, and Beth Fairfield. Aging and

emotional expressions: is there a positivity bias during dynamic emotion recognition? Frantiers

in psychology, 6, 2015.

[54] Judith A Hall and David Matsumoto. Gender differences in judgments of multiple emotions

from facial expressions. Emotion, 4(2):201,2004.

[55] D Matsumoto and P Ekman. Japanese and caucasian brief affect recognition test (jacbart), i,

ii, iii [videotapes]. Available from Culture and Emotion Research Laboratory, Department of

Psychology, San Francisco State University, 1600, 1992.

[56] Jens Allwood. Cooperation and flexibility in multi modal communication. In Cooperative

Multimodal Communication, pages 113-124. Springer, 200 I.

[57] Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe

b k L UI' h Neumann and Shrikanth Narayanan. Analysis of erno-

Kazernzadeh, Sung 0 ee,tion recognition using facial expressions, speech and multi d I' ". . mo a inrormanon. In Proceedings

ofthe 6th international conference on Multilllodal interfaces; pages 205-211. ACM, 2004.

[58] Christina Regenbogen, Daniel A Schneider' Raquel E Gur" Fral'k Schnelider, Ute Habel, and

Thilo Kellermann. Multimodal human communicationtargeting facial . expressions. speech content

and prosody. Neuroimage, 60(4):2346-2356, 2012.

[59] David Crystal. The English tone of voice: Essays in intonation, prosody and paralanguage.

Hodder Arnold, 1975.

[60] Peter Roach. English Phonetics and Phonology Fourth Edition: A Practical Course. Ernst

Klett Sprachen, 2010.

[61] Sieb Nooteboom. The prosody of speech: melody and rhythm. The handbook of phonetic

sciences, 5:640-673, 1997.

[62] John Laver. Principles ofphonetics. Cambridge University Press, 1994.

[63] Peter Roach. Techniques for the phonetic description of emotional speech. In ISCA Tutorial

and Research Workshop (ITRW) on Speech and Emotion, 2000.

[64] Petri Laukka, Patrik Juslin, and Roberto Bresin. A dimensional approach to vocal expression

of emotion. Cognition & Emotion, 19(5):633-653,2005.

[65] Helen M Hanson and Erika S Chuang. Glottal characteristics of male speakers: Acoustic

correlates and comparison with female data. The Journal ofthe Acoustical Society ofAmerica,

106(2):1064-1077,1999.

[66] Nick Campbell and Parham Mokhtari. Voice quality: the 4th prosodic dimension. In 15th

ICPhS, pages 2417-2420, 2003.

[67] Frank Dellaert, Thomas Polzin, and Alex Waibel. Recognizing emotion in speech. In Spoken

Language, 1996. 1CSLP 96. Proceedings., Fourth International Conference on, volume 3,

pages 1970-1973. IEEE, 1996.

[68] Tin Lay Nwe, Foo Say Wei, and Liyanage C De Silva. Speech based emotion classification. In

TENCON 2001. Proceedings of IEEE Region 10 International Conference 011 Electrical and

Electronic Technology, volume I, pages 297-30 I. IEEE, 200 I.

[69]

. d AILBHE NI CHASAIDE Voice quality and loudness

Irena Yanushevskaya, Chnster GobI, an

in affect perception. 2008.

[70] Sylvie Mozzicon acci. Emotion and attitude conveyed in speech by means of prosody. I n

2nd Workshop on Attitude, Personality and Emotions in User-Adapted illteraction. Citeseer. 200

[71] Julia Hirschberg, Diane Litman, and Marc Swerts. Prosodic and other cues to speech recogni

tion failures. Speech Communication, 43(1):155-175, 2004.

[72] Paul Ekman and Harriet Oster. Facial expressions of emotion. Annual review of

psychology,

30(1):527-554, 1979.

[73] Klaus R Scherer. The Functions of Nonverbal Signs in Conversation, chapter 8, pages 225-244.

Hillsdale: L. Erlbaum, 1980.

[74] Juergen Luettin and Neil A Thacker. Speechreading using probabilistic models.

Computer Vision and Image Understanding, 65(2):163-178, I 997.

[75] Paul Ekman. About brows: Emotional and conversational signals. Human ethology,

pages

169-202, 1979.

[76] Paul Ekman. Telling lies: Clues to deceit in the marketplace, politics, and

marriage. WW

Norton & Company, 2009 .

[77] Javid Sadro, lzzat Jarudi, and Pawan Sinhao. The role of eyebrows in face recognition.

Per ception, 32(3):285-293, 2003.

[78] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active

shape models-their training and application. Computer vision and image understanding, 61(I

):38-59, 1995.

[79] Javier R Movellan. Visual speech recognition with stochastic networks. Advances in

neural infonnation processing systems, pages 851-858, 1995.

[80] Beat Fasel and Juergen Luettin. Automatic facial expression analysis: a survey. Pattern

recog nition, 36(1):259-275, 2003.

[81] Paul Ekman and Wallace V Friesen. Measuring facial movement. Environmental psyclwlogr

and nonverbal behavior, I (I ):56-75, 1976.

[82] Marian Stewart Bartlett, Gwen C Littlewort, Mark G Frank, Claudia Lainscsek, Ian R

Fasel.,j and JavierR Movellan. Automat ic recognition of facial actions in spontaneous

exp r e s s ions.Journal cl multimedia, 1(6):22-35. 2006.

[83] Takeo Kanade, Jeffrey F Cohn, and Yingli Tian. Comprehensive database for facial

expres sion analysis. In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth

IEEE International Conference on, pages 46-53. IEEE, 2000.

[84] Timothy F Cootes, Gareth J Edwards, and Christopher J Taylor. Active appearance models. In

Computer VisionECCV98, pages 484-498. Springer, 1998.

[85] Timothy F Cootes, Gareth J Edwards, and Christopher J Taylor. Comparing active

shape models with active appearance models. In BMVC, volume 99, pages 173-182, 1999.

[86] Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin, lain Matthews, Herve Glotin, Dimi

tra Vergyri, June Sison, and Azad Mashari. Audio visual speech recognition. Technical report,

IDIAP, 2000.

[87] Minh Tue Vo and Alex Waibel. Multimodal human-computer interaction. Proceedings

of ISSD, 93, 1993.

[88] Maja Pantie and Leon JM Rothkrantz. Toward an affect-sensitive multimodal human-computer

interaction. Proceedings of the IEEE, 91(9):1370-1390, 2003.

[89] Matthew Turk. Multi modal human-computer interaction. In Real-time vision for

human computer interaction, pages 269-283. Springer, 2005.

[90] Matthew Turk and Mathias Kolsch. Perceptual interfaces. Emerging Topics in

Computer Vision, Prentice Hall, 2004.

[91] Takaaki Shochi, Donna Erickson, Albert Rilliard, Veronique Auberge, Jean-Claude

Martin, et al. Recognition of japanese attitudes in audio-visual speech. In Speech

prosody, volume 2008, pages 689-692, 2008.

[92] Loic Kessous, Ginevra Castellano, and George Caridakis. Multimodal emotion recognition in

speech-based interaction using facial expression, body gesture and acoustic analysis. Journal on

Multimodal User lnte1faces, 3(1-2):33--48, 20 I 0.

[93] Heather Molyneaux, Kerri Gibson, Susan O'Donnell, and Janice Singer. New visual

media and gender: A content, visual and audience analysis of youtube vlogs. 2008.

[94] James Trier. cool engagements with youtube: Part I. Journal of Adolescent & Adult

Literacy,50(5):408--4 I 2, 2007

[95] Prince Deh. Promoting information-sharing in ghana using video blogging.

Participatory Learning and Action, 59( I ):40-43, 2009.

[96] Wen Gao, Yonghong Tian, Tiejun Huang, and Qiang Yang. Vlogging: A survey of videoblog

ging technology on the web. ACM Computing Surveys (CSUR), 42(4):15, 2010.

[97] Elisabetta Adami. we/youtube: exploring sign-making in video-interaction. Visual

Co1111n11ni cation, 8(4):379-399, 2009.

[98] Joan-Isaac Biel and Daniel Gatica-Perez. Vlogcast yourself: Nonverbal behavior and

attention in social media. In International Conference on Multimodal Inte1faces and the

Workshop on Machine Learning for Multimodal Interaction, page 50. ACM, 20 I 0.

[99] Joan-Isaac Biel and Daniel Gatica-Perez. The youtube lens: Crowdsourced personality

im pressions and audiovisual analysis of vlogs. Multimedia, IEEE Transactions on, 15( I

):41-55,2013

[100] Dairazalia Sanchez-Cortes, Joan-Isaac Biel, Shiro Kumano, Junji Yamato, Kazuhiro

Otsuka,and Daniel Gatica-Perez. Inferring mood in ubiquitous conversational video. In Proceedings

I of the 12th International Conference on Mobile and Ubiquitous Multimedia, page

22. ACM,2013.

[101] Joan-Isaac Biel and Daniel Gatica-Perez. Wearing a youtube hat: directors, comedians,

gurus, and user aggregated behavior. In Proceedings of the 17th ACM international

conference on Multimedia, pages 833-836. ACM, 2009.

[102] Joan-Isaac Biel and Daniel Gatica-Perez. Vlogsense: Conversational behavior and

social at tention in youtube. ACM Transactions on Multimedia Computing, Communications, and

Ap plications (TOMCCAP ), 7(1 ):33, 2011.

[103) Joan-Isaac Biel and Daniel Gatica-Perez. The good, the bad, and the angry: Analyzing

crowd sourced impressions of vloggers. Ethnicity, 16(4.8):0-7, 2012.

[104] Joan-Isaac Biel, Vagia Tsiminaki, John Dines, and Daniel Gatica-Perez. Hi

youtube!: per sonality impressions and verbal content in social video. In Proceedings

of the 15th ACM on International conference on multimodal interaction, pages 119-126. ACM, 20 I

[105] Robert R McCrae and Oliver P John. An introduction to the five-factor model and its

applica tions. Personality: critical concepts in psychology, 60:295. 1998.

[106] Paul Boersma. Praat,a system for doing phonetics by computer. Glot International,

5(9/10):34 1- 3 4 5, 2001.

[107] Ligia Maria Batrinca, Nadia Mana, Bruno Lepri, Fabio Pianesi, and Nicu Sebe.

Please, tell me about yourself: automatic personality assessment using short

self-presentations. In Pro ceedings of the 13th international conference on multimoda/

inte1faces, pages 255-262. ACM, 2011.

[108] Rosalind W Picard and Roalind Picard. Affective computing, volume 252. MIT press

Cam bridge, 1997.

[109] Cindy L Bethel and Robin R Murphy. Survey of non-facial/non-verbal affective

expressions for appearance-constrained robots. Systems, Man, and Cybernetics, Part C:

Applications and Reviews, IEEE Transactions on, 38(1):83-92, 2008.

[110] Ashish Kapoor, Winslow Burleson, and Rosalind W Picard. Automatic prediction of

frustra tion. International journal of human-computer studies, 65(8):724-736, 2007.

[111] Ashish Kapoor and Rosalind W Picard. Multimodal affect recognition in learning

environ ments. In Proceedings of the 13th annual ACM international conference on Multi media,

pages 677-682. ACM, 2005.

[112] Laurel D Riek, Maria F Oconnor, and Peter Robinson. Guess what? a game for

affective annotation of video using crowd sourcing. In Affective computing and intelligent

interaction, pages 277-285. Springer, 2011.

[113] Eva Schultze-Berndt. Linguistic annotation. Essentials of language documentation,

178:213,

2006

[114] Stefanie Nowak and Stefan Ruger. How reliable are annotations via crowdsourcing: a

study about inter-annotator agreement for multi-label image annotation. In Proceedings of the

inter national conference on Multimedia information retrieval, pages 557-566. ACM, 2010.

[115] Jens Allwood, Loredana Cerrato, Laila Dybkjaer, Kristiina Jokinen, Costanza Navarretta,

and Patrizia Paggio. The mumin multimodal coding scheme. In Proc. Workshop on Multimodal

Corpora and Annotation, 2005.

[116] Michael Kipp. Graduate college for cognitive sciences university of the saarland,

germany.2001

[117] Magnus Gunnarsson. User man ual for multitool. Technical report, Technical Repo rt

avail ale from http://www. ling. gu. se/mgunnar/multitool/MT-manual. pdf, 2002.

[118] Niels Ole Bernsen, Laila Dybkjaer, and Mykola Kolodnytsky. The nite workbench-a

tool for annotation of natural interactivity and multimodal data. In Las Pa/mas. Citeseer,

2002.

[119] Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics.

Com putational Linguistics, 34(4):555-596, 2008.

[120] Jacob Cohen et al. A coefficient of agreement for nominal scales. Educational and

psycholog ical measurement, 20( I ):37-46, 1960.

[121] Joseph L Pleiss, Jacob Cohen, and BS Everitt. Large sample standard errors of

kappa and weighted kappa. Psychological Bulletin, 72(5):323, 1969.

[122] Ron Kohavi and Foster Provost. Glossary of terms. Machine Learning,

30(2-3):271-274, 1998.

[123] Eva Szekely, John Kane, Stefan Scherer, Christer Gobi, and Julie Carson-Berndsen.

Detect- ') ing a targeted voice style in an audiobook using voice quality

features. In Acoustic s, Speech and Signal Processing (ICASSP), 2012 IEEE International

Conference on, pages 4593-4596.

IEEE, 2012.

[124] Chu-Hsing Lin, Jung-Chun Liu, and Chia-Han Ho. Anomaly detection using libsvm

training tools. In Information Security and Assurance, 2008. ISA 2008. International

Cmiference on, pages 166-171. IEEE, 2008.

[125] Guido W Imbens and Richard Spady. Confidence intervals in generalized method of moments

models. Journal of econometrics, I 07( I ):87-98, 2002.

[126] Marian Stewart Bartlett, Gwen Littlewort, Claudia Lainscsek, Ian Fasel, and Javier

Movellan.Machine learning methods for fully automatic recognition of facial expressions and facial

ac tions. In Systems, Man and Cybernetics, 2004 IEEE International Conference 011, volume

I, pages 592-597. IEEE, 2004.

[127] Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. Distinguishing the

communicative functions of gestures. In Machine Learning for Multimodal Interaction, pages 38-49.

Springer,2008

[ 128] Chih-Chung Chang and Chih- Jen Lin. Libsvm: A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.

[129] Hatice Gunes, Caifeng Shan, Shizhi Chen, and YingLi Tian. Bodily expression for

automatic affect recognition. Emotion Recognition: A Plittem Analysis Approach, pages 343-377,

2015.

[130] I Mccowan, G Lathoud, M Lincoln, A Lisowska, W Post, D Reidsma, and P Wellner. The ami

meeting corpus. In In: Proceedings Mells11ri11g Behavior 2005, 5th Intemational Conference on

Methods and Techniques in Behavioral Research. LP11 Nold11s, F. Grieco, LWS Loijens llnd PH

Zimmerman (Eds.), Wageningen: Noldus Informlltion Technology, 2005.

[131] Shannon Hennig, Ryad Chellali, and Nick Campbell. The d-ans corpus: the

dublin autonomous nervous system corpus of biosignal and multimodal recordings of

conversational speech. In Proceedings of the ELRA, the 9th Edition of the Language Resources and

Evaluation Conference. Reykjavik, Iceland, pages 26-31, 2014.

[132] Joan-Isaac Biel, Daniel Gatica-Perez, et al. Voices of vlogging. In ICWSM, 20 I 0.

[133] Casey Neistat. How To Vlog. https://www.youtube.com/watch?v=

dGLE E Z z l 5N4, 2015. [Online ; accessed 07-July-2015].

[134] Miss Fenderr. HOW I MAKE MY VIDEOS!(Step-By-Step Tutorial). https: / /www.

youtube. com/watch?v=dGLEEZZ15N4, 2014. [Online; accessed 07-July-2015].

[135] Ruchard Dufour, Vincent Jousse, Yannick Esteve, Frederic Bechet, and Georges

Linares.Spontaneous speech characterization and detection in large audio database. SPECOM, St.

Petersburg, 2009.

[136] Timothy Delaghetto. Annoying People I Hate #2. https://www.youtube.com/

watch?v=pY7n5mpG5mU, 2014. [Online; accessed 07-March-2014].

[137] Timothy Delaghetto. Be a Gentleman, Get the Booty. https: //www.youtube.com/

watch?v=jbqW7SUjuT0, 2012. [Online; accessed 07-March-2014].

[138] British Psychological Society. Report of the working party on conducting research

on the internet: guidelines for ethical practice in psychological research online. British

Psychological Society Leicester, 2007.

[139] Paul Reilly. The battle of stokes crofton youtube: The development of an ethical stance

for the study of online comments. 201 3 .

[140] Fabian Neuhaus and Timothy Webmoor. Agile ethics for massi fie d research and vi

sualization.

Infonnation, Communication & Society, 15( I ):43-65, 2012.

[141] Stephen Pihlaja. Antagonism on Yo11Tube: Metaphor in Online Discourse. Bloomsbury Pub

lishing, 2014.

[142] Youtube copyright centre. https://www.youtube.com/yt/copyright/

[143] Niga Higa. Off The Pill - Arrogant People). ht tps: / /www. yout ube. com/watch ?v=

7sz5cI5lenE, 2011. [Online; accessed 07 March-2014].

[144] Sound systems: Mono versus stereo. http://www. mcsquared. com/mono-stereo. htm. Accessed:

2012-12-14.

[145] James A Russell. Core affect and the psychological construction of emotion.

Psychological review, 110(1):145, 2003.

[146] Kre Sjlander and Jonas Beskow. Wavesurfer - an open source speech tool, 2000.

[147] P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. S!oetjes. Elan: a profe

ssional framework for multimodality research. In Proceedings of Language Resources and

Evaluation Conference (LREC), 2006.

[148] Noor A Madzlan, J Reverdy, Francesca Bonin, Loredana Sundberg Cerrato, and Nick Camp

bell. Multi modal perception of attitudes: A study on video biogs. In 3rd European Symposium on

Multimodal Communication, 2015.

[149] Ekaterina P Volkova, Betty J Mohler, Detmar Meurers, Dale Gerdemann, and Heinrich

H Biilthoff. Emotional perception of fairy tales: achieving agreement in emotion annotation

of text. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to

Analysis and Generation of Emotion in Text, pages 98-106. Association for Computational

Linguistics, 20 I 0.

[150] Bjorn Schuller. Multi modal affect databases: Collection, challenges, and chances. T/1e

Ox.frml Handbook of Affective Computing, pages 323-333, 2014 .

[151] Judith A Hall. Gender effects in decoding nonverbal cues. Psychological bulletin,

85(4):845,1978

[152] Hillary Anger Elfenbein and Nalini Ambady. On the universality and cultural

specificity of emotion recognition: a meta-analysis. Psychological bulletin, 128(2):203, 2002.

[153] Aoju Chen. Unil'ersal and language-spec(fic perception

mean ing. Utrecht: LOT, 2005.

[154] Loredana Sundberg Cerrato. Investigating communicative feedback phenomena across

lan guages and modalities. 2007.

[155] Marc Schroder, Roddy Cowie, Ellen Douglas-Cowie, Machiel Westerdijk, and Stan CAM

Gielen. Acoustic correlates of emotion dimensions in view of speech synthesis. In

INTER SPEECH, pages 87-90, 2001.

[156] James Hillenbrand, Ronald A Cleveland, and Robert L Erickson. Acoustic correlates of breathy

vocal quality. Journal of Speech, Language, and Hearing Research, 37(4):769-778, 1994.

[157] Yen-Liang Shue, Gang Chen, and Abeer Alwan. On the interdependencies between

voice quality, glottal gaps, and voice-source related acoustic measures. In /NTERSPEECH,

pages 34-37, 2010.

[158] Pedro Tome, Julian Fierrez, Ruben Vera-Rodriguez. and Daniel Ramos. Identification

using face regions: Application and assessment in forensic scenarios. Forensic science

international, 233(1 ):75-83, 2013.

[159] Hua Wang, Heng Huang, and Fillia Makedon. Emotion detection via discriminant

laplacian embedding. Universal access in the information society, I 3(1):23-31, 2014.

[160] Isabelle Guyon and Andre Elisseeff. An introduction to variable and feature

selection. The Journal of Machine Learning Research, 3:1157-1182, 2003.

[161] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation

for Statistical Computing, Vienna, Austria, 2013.

[162] Karl Pearson. Note on regression and inheritance in the case of two parents.

Proceedings of the Royal Society of London, pages 240-242, I 895.

[163] Bernard L Welch. The generalization ofstudent's' problem when several different

population variances are involved. Biometrika, pages 28-35, 1947.

[164] R. Mankiewicz. The story of mathematics. The story of mathematics. Princeton

University Press, 2000.

[165] Graeme D Ruxton. The unequal variance Hest is an underused alternative to

student's t-test and the mann-whitney u test. Behavioral Ecology, 17(4):688-690, 2006.

[I 66] Jarkko Isotalo. Ba,ics of statistics. University

[167] CORRELATION DEDUCED FROM A SMALL SAMPLE. correlation coefficients covering the cases

(i)the frequency dis-tribution of the values of the correlation coefficient in samples

from an indefinitely large population, biometrika, vol. I 0, pp. 507v52 I, 1915. here the method

of defining the sample by the coordinates of. I 921.

[168] David Martin Powers. Evaluation: from precision, recall and f-measure to roe,

informedness, markedness and correlation. 201 I.

[169] Herve Abdi and Lynne J Williams. Principal component analysis. Wiley

Interdisciplinary Reviews: Computational Statistics, 2(4):433-459, 2010.

[170] Noor Alhusna Madzlan, Jingguang Han, Francesca Bonin, and Nick Campbell. Towards

au tomatic recognition of attitudes: Prosodic analysis of video biogs. Speech Prosody,

Duhlin, Ireland, pages 91-94, 20 I 4.

[171] Noor Alhusna Madzlan, Yuyun Huang, and Nick Campbell. Speech and Computer:

17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceed

ings, volume 9319, chapter Automatic Classification and Prediction of Attitudes: Audio-Visual

Analysis of Video Biogs. Springer, 2015.

[172] Helen M Hanson. Glottal characteristics offemale speakers: Acoustic correlates. The

Journal of the Acoustical Society of America, IOI :466, 1997.

[173] Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin, and Nick Campbell.

Automatic recognition of attitudes in video biogs-prosodic and visual feature analysis. In

Fifteenth Annual Conference of the International Speech Communication Association, 2014.

[174] Gerasimos Potamianos, Chalapathy Neti, Guillaume Gravier, Ashutosh Garg, and AndrewW

Senior. Recent advances in the automatic recognition of audiovisual speech. Proceeding.1·

cf the IEEE, 91(9):1306-1326, 2003.

[175] Sanaul Haq, Philip JB Jackson, and James Edge. Audio-visual feature selection and

reduction for emotion classification. In Proc. Int. Conj. on Auditory-Visual Speech Processing (

AVSPOR ),Tangalooma, Australia, 2008.

[176] Wei Fan. Systematic data selec tion to mine concept-drifting data streams. In

Proceedings

mining, pages 128-137. ACM, 2004.

[177] Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. A practical guide to support

vector classification, 2003.

[178] Tin Kam Ho. Random decision forests. In Document Analysis and Recognition, 1995.,

Pro ceedings of the Third International Co1ifere11ce on, volume I, pages 278-282. IEEE, 1995.

[179] J Hohnsbein and S Mateeff. The time it takes to detect changes in speed and direction of

visu:..I motion. Vision Research, 38( 17):2569-2573, 1998.

[180] SK DMello and AC Graesser. Feeling, thinking, and computing with affect-aware

learning technologies. The Oxford Handbook of Affective Computing, pages 4 I 9-434, 20 I 4.

[181] Stephen Brown. Meet pepper, the emotion reading robot. TECHNOLOGY, 2014.

[182] Cory D Kidd, Will Taggart, and Sherry Turkle. A sociable robot to encourage social

interaction among the elderly. In Robotics and Automation, 2006. JCRA 2006. Proceedings

2006 IEEE International Conference 011, pages 3972-3976. IEEE, 2006.

[183] Rosalind W Picard, Seymour Papert, Walter Bender, Bruce Blumberg, Cynthia Breazeal, David

Cavallo, Tod Machover, Mitchel Resnick, Deb Roy, and Carol Strohecker. Affective learninga

manifesto. BT Technology Journal, 22(4):253-269, 2004.

[184] J. Burgess, J. Green, H. Jenkins, and J. Hartley. YouTube: Online Video and

Participatory Culture. DMS - Digital Media and Society. Wiley, 2013.

This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to search page