renshuu.org requires Javascript to work correctly. Please enable Javascript and reload this page.

renshuu requires cookies to work correctly. Please enable cookies and reload this page.

掲示板 Forums - Text Analyzer

Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests

マイコー

Level: 328

Please use this thread for all comments and bug reports. Before posting, please read the following notes/status.

1. The analyzer will never be 100% perfect. I am aiming for 95-98% accuracy.

2. There will be times where it simply cannot break apart a small piece of the text. Please report these for checking. In particular, I am most interested in verbs or adjectives that seem to be broken up. When submitting, please do the following:

A. Try submitting a single sentence of Japanese that contains the error.

B. Report with the entire sentence, as well as the location of the error (and expected results).

3. For now, words that were not found in the dictionary will appear RED. This will include errors (where words were broken up incorrectly) as well as words that are real, but not in renshuu's dictionary (such as names).

4. CURRENT ISSUES

- Verbs with って afterwards parse badly (Ex. 見つかったって)

Link: https://www.renshuu.org/index.... (It's not yet in the menu)

4 years ago

マイコー

Level: 328

Link has been posted to the OP.

4 years ago

cmertb

Level: 391

In this sentence:

つい三時間ほど前までは住所不定ではない、

　ただの引きこもりベテランニートだったのだが、

　気付いたら親が死んでおり、

　引きこもっていて親族会議に出席しなかった俺はいないものとして扱われ、

　兄弟たちの奸計にハマり、見事に家を追い出された。

ベテランニート didn't get split up into ベテラン and ニート

4 years ago

cmertb

Level: 391

無視すると、弟が木製バットで命よりも大切なパソコンを破壊しやがった。

がった at the end was marked in red. Probably need special treatment for やがる.

4 years ago

cmertb

Level: 391

半狂乱で暴れてみたものの、兄は空手の有段者で、逆にぼっこぼこにされた。

ぼっこぼこ got misinterpreted.

What I'm seeing is

Ideally it should just mark ぼっこぼこ as unrecognized.

4 years ago

cmertb

Level: 391

ズキズキと痛む脇腹（多分肋骨が折れてる）を抑えながら、とぼとぼと町を歩く。

ながら didn't get handled correctly:

I guess because the dictionary entry is actually ～ながら?

4 years ago

cmertb

Level: 391

ハロワの場所なんかわかるわけもなし。

Probably should add ハロワ to the dictionary, and also なし turned into 成し (but I guess kuromoji does that often).

4 years ago

cmertb

Level: 391

Overall, I'm not seeing any underlines in my browser, so it's hard to tell without hovering which words are new to me. Was that a phone only feature?

4 years ago

cmertb

Level: 391

その時の写メは、いとも容易く学校中にバラまかれた。

Two issues here: 写メ got misinterpreted, and ばら撒く too.

4 years ago

gillianfaith

Level: 1322

I'm able to see underlines on unknown words on desktop, Windows 10, Firefox 97.

Looks like the parser struggles with the ~さそう ending. I only did a quick test, but it seems to consistently parse it as さ / 庄 :

input: 知らない方がよさそうだ。

expectation: 知らない / 方 / が / よさそう / だ / 。

result: 知らない / 方 / が / よ / さ / 庄 / だ / 。

input: 壁を開けるしかなさそうだね｡

expectation: 壁 / を / 開ける / しか / なさそう / だ / ね / ｡

result: 壁 / を / 開ける / しか / な / さ / 庄 / だ / ね / ｡

input: 長くいられるような場所じゃなさそうだ

expectation: 長く / いられる / よう / な / 場所 / じゃ / なさそう / だ (or maybe じゃなさそう, as a form of じゃない? not sure which is the more helpful way to parse)

result: 長く / いられる / よう / な / 場所 / じゃ / な / さ / 庄 / だ

Also noting that the ｡ in the second test was flagged as not in the dictionary, whereas the 。 in the first test was. I'm assuming because it's half-width?

4 years ago

マイコー

Level: 328

This'll be all I can do tonight! (Not, it does not fixed already parsed readings)

1. さ (adjective > noun) form fixed

2. half-width period fixed

3. 写メ not in base dictionary (this is not renshuu's dictionary), impossible (for now)

4. ながら fixed

5. ぼっこぼこ not in base

6 . ベテランニート fixed

7. がった fixed

Notes: The "base dictionary" is a layer below renshuu, part of a package called kuromoji. Although renshuu can work around it *sometimes*, if the word is not in this dictionary, it'll get broken up into smaller components that are very hard to stitch and put back together (on renshuu's layer). Looking into a way to inject terms into that base dictionary, but it's always a nightmare working with someone else's code library.

4 years ago

cmertb

Level: 391

ただの引きこもりベテランニートだったのだが、

Didn't notice this the first time without the underlines, but there is another issue in that sentence, 引きこもり got split up.

4 years ago

cmertb

Level: 391

無様に泣きじゃくって事無きをえようとしたら、着の身着のまま家から叩き出された。

A couple of issues here, although I suspect they both come from kuromoji and can't be fixed now.

1) 事なきを得る didn't get recognized as an expression

2) 家 was interpreted as か rather than いえ

4 years ago

cmertb

Level: 391

紹介された所に履歴書を持っていき、面接をうけるわけだ。

履歴書 got split up into 履歴 and 書.

4 years ago

|マルコ|

Level: 110

Original: そうそう、あの、おさななじみの発明好きな子……
Parsed: そう庄、彼の、幼馴染み乃発明好きな子……

the そうそう in the sentence above is split into そう(so, really, seeming) 庄(manor/villa)

4 years ago

マイコー

Level: 328

I'm just tossing this out here - if anyone is proficient in node/js and wants to help me reverse-engineer the dictionary file formats in the base layer that I am using, let me know! (https://github.com/takuyaa/kur...) Been trying to figure out how to add new entries to the underlying dictionary.

4 years ago

gillianfaith

Level: 1322

Came up on the parser not recognizing ~えば endings when they're palatalized, and also deleting some characters in the results.

input: どうすりゃいいだろう？

expectation: どう / すりゃ / いい / だろう / ？

result: どう / 磨り / いい / だろう / ？

Parser deletes the ゃ in すりゃ, and mistakes する for 磨る.

input: お客さんに、持って行かなきゃ！

expectation: お客さん / に / 、 / 持って行かなきゃ / ！

result: お / さん / に / 、 / 持って / 行 / なきゃ / ！

Parser deletes the 客 from お客さん, and splits 行かなきゃ into 行 & なきゃ, which are both unmatched.

input: 助けなきゃよかったかな。

expectation: 助けなきゃ / よかった / かな / 。

result: 助け / なきゃ / よかった / 佳な / 。

Parser splits away なきゃ and fails to match it, and mistakes かな ending for a な-adjective.

4 years ago

マイコー

Level: 328

1. Fixed そうそう

2. 履歴書 is not yet possible with current setup. kuromoji splits it, and both words are in the dictionary. This will be fixed by the future implementation of complex words, where a word can be marked as a subset of a larger word, and the system will try to match those together. 100% doable, just not yet.

3. So, it's marking そりゃ as a verb form of する. Any idea what that's called? I can easily code it into the system, but I'm not sure what it is or what the full rule set is.

4. 行かなきゃ fixed

5. かな fixed (interestingly, it had this as a form of the based "unit" か.

6. お客 partially fixed - but no お客さん (see #2)

7. 事なきを得る < see #2

8. 着の身着のまま家から <-- bad kuromoji marking. This may be tricky - we'll have to see how many more of these come out before we can consider a rule to overrule kuromoji.

I think the compound word issue is going to be the largest one, and one that I may need to implement sooner than later.

In order to give you all something to play with, though, I'll try to get the Actions panel set up soon so you can start exporting this stuff to lessons/schedules.

4 years ago

|マルコ|

Level: 110

1)
Original: ‥‥くそっ！ (damn!)
Parsed: ‥‥く素！

2)
Original: ‥‥摑まりたくねえ‥‥
ねえ gets recognized as "right?/don't you think?" instead of ない

3)
Original: こと
Original: 彼奴が遣ったこと
こと always get the translation of "particle indicating command, mild enthusiasm etc" instead of the more common 事

4)
Original: ‥‥そうだ、彼奴だ‥‥
Parsed: ‥‥庄だ、彼奴だ‥‥

5)
Original:（もう、何もかも　オシマイなんだぁ！）
Parsed: （もう、何もかも　御仕舞いな乃だぁ！）
If there is katakana used for empashis, foreing accents, robotic voice etc..., maybe the Reader should try to find a way to link 御仕舞い under the hood but leave it display as オシマイ

6)
Original: あそこで叫んでるの
Parsed: 彼処で叫出る乃

Contracted Te-iru form turned into でる

7)
Original:‥‥死にたがってるわよ。
Parsed:‥‥死にたがってるわよ。
たがる is failing to being linked

8)
Original:矢張 (Yahari - family name)
Names are getting parsed incorrectly, this was splitted into 矢 and 張 - might be the right time to import the same Names Dictionary used by jisho :D

9)
Original: うわあ
Parsed: うわあ
This is being split into 2

10)
Original:言わなくちゃ
ちゃ is not recognized as the contraction of ては

4 years ago

gillianfaith

Level: 1322

マイコーは 02月 9日, 11:47に

3. So, it's marking そりゃ as a verb form of する. Any idea what that's called? I can easily code it into the system, but I'm not sure what it is or what the full rule set is

と言いました。

*そりゃ is a blend of それは, not a form of する afaik. It has its own dictionary entry and as far as I've seen gets parsed correctly.

ichi.moe categorizes すりゃ the same as the "provisional" -eba form (link). I think the rule is to just replace れば with りゃ for verbs and ければ with きゃ for adjectives; すれば→すりゃ, なければ→なきゃ .

4 years ago

Getting the posts

Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests

和英辞典Vocabulary dictionary

Filters

漢字辞典 Kanji dictionary

Filters

文法辞典 Grammar dictionary

Filters

例文検索 Sentence lookup

掲示板 Forums - Text Analyzer