The software was originally based on ChaSen and was developed under the name ChaSenTNG, but now it is developed independently from ChaSen and was rewritten from scratch. MeCab's analysis accuracy is comparable to ChaSen, and it is about 3–4 times faster.
MeCab analyzes and segments a sentence into its parts of speech. There are several dictionaries available for MeCab, but IPADIC is the most commonly used one as with ChaSen.
In 2007, Google used MeCab to generate n-gram data for a large corpus of Japanese text, which it published on its Google Japan blog.[3]
Besides segmenting the text, MeCab also lists the part of speech of the word, and, if applicable and in the dictionary, its pronunciation. In the above example, the verb できる (dekiru, "to be able to") is classified as an ichidan (一段) verb (動詞) in the infinitive tense (基本形). The word でも (demo) is identified as an adverbial particle (副助詞). As not all columns apply to all words, when a column does not apply to a word, an asterisk is used; this makes it possible to format the information after the word and the tab character as the comma-separated values.
MeCab also supports several output formats; one of which, chasen, outputs tab-separated values in a format that programs written for ChaSen can use. Another format, yomi (from 読む yomu, to read), outputs the pronunciation of the input text as katakana,[6] as shown below.
^"大規模テキスト処理を支える形態素解析技術(工藤拓氏・Google)" [(Lecture) Morphological analysis supports large scale text processing (By Mr. Taku Kudou, employee at Google)] (in Japanese). 2009-12-03. Retrieved 2009-12-03.
^"iPhoneの仮名漢字変換はMeCabを利用" [iPhone uses MeCab for kana-kanji conversion] (in Japanese). 2009-12-03. Archived from the original on 2008-09-18. Retrieved 2009-12-03.