Research Topics

Natural language understanding/Information Extraction

Since Japanese written texts do not have spaces between words, we should somehow tokenize a text at the very beginning of linguistic processing. Thanks to the progress of machine learning methods state-of-art technoligies achieved over 95% accuracy of so-called morphological analysis, which covers both tokenization and part-of-speech tagging for well-formed Japanese, typically news paper articles, manuals, text bookes, etc.
It is, however, still challenging to handle corrupted texts including blogs, tweets, bbs-texts, etc. We aim at developing shallow analyzer for these corrupted texts.

Okayama Dialect Processing

Linked Data / Semantic Web

Natural language is the most flexible tool for sharing, storing, and manipulating information. This flexibility, at the same time, prevents computers from correctly handling information. Linked data is a realistic solution to solve this problem. We are conducting research on sharing information of Okayama area with linked data (rdf/xml), in tourism and disaster reduction domains.

Experimental system (Okayama event serach system )