Vocabulary dictionary

Kanji dictionary

Grammar dictionary

Sentence lookup

test
 

Forums - Text Analyzer

Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests

Page: 1 of 3



avatar
マイコー
Level: 328

Please use this thread for all comments and bug reports. Before posting, please read the following notes/status.

1. The analyzer will never be 100% perfect. I am aiming for 95-98% accuracy.

2. There will be times where it simply cannot break apart a small piece of the text. Please report these for checking. In particular, I am most interested in verbs or adjectives that seem to be broken up. When submitting, please do the following:

A. Try submitting a single sentence of Japanese that contains the error.

B. Report with the entire sentence, as well as the location of the error (and expected results).

3. For now, words that were not found in the dictionary will appear RED. This will include errors (where words were broken up incorrectly) as well as words that are real, but not in renshuu's dictionary (such as names).

4. CURRENT ISSUES

- Verbs with って afterwards parse badly (Ex. つかったって)


Link: https://www.renshuu.org/index.... (It's not yet in the menu)

3
4 years ago
avatar
マイコー
Level: 328

Link has been posted to the OP.

1
4 years ago
avatar
cmertb
Level: 391

In this sentence:

ついほどまではではない、
 ただのきこもりベテランニートだったのだが、
 いたらんでおり、
 きこもっていてしなかったはいないものとしてわれ、
 たちのにハマり、された。

ベテランニート didn't get split up into ベテラン and ニート

1
4 years ago
avatar
cmertb
Level: 391
すると、バットでよりもなパソコンをしやがった。

がった at the end was marked in red. Probably need special treatment for やがる.

0
4 years ago
avatar
cmertb
Level: 391
れてみたものの、で、にぼっこぼこにされた。

ぼっこぼこ got misinterpreted.

What I'm seeing is

43985133d8dcbe47e9016697d617b8bec0acdff1103d3c27.png

Ideally it should just mark ぼっこぼこ as unrecognized.

0
4 years ago
avatar
cmertb
Level: 391
ズキズキとれてる)をえながら、とぼとぼとく。

ながら didn't get handled correctly:

737a9ff7acab3967b132e57cf3aca42a244a97980c29d968.png

I guess because the dictionary entry is actually ~ながら?

0
4 years ago
avatar
cmertb
Level: 391

ハロワのなんかわかるわけもなし。

Probably should add ハロワ to the dictionary, and also なし turned into し (but I guess kuromoji does that often).

038588ea065908b0a09348ee23782a3afdf26fe7cddabac8.png
0
4 years ago
avatar
cmertb
Level: 391

Overall, I'm not seeing any underlines in my browser, so it's hard to tell without hovering which words are new to me. Was that a phone only feature?

6a30c1a0ec4664808894e0f4441e8f6c0fc482a208b231ef.png
0
4 years ago
avatar
cmertb
Level: 391
そのメは、いともにバラまかれた。

Two issues here: メ got misinterpreted, and ばらく too.

6d4c7e9ca7f1b7c7094304593c80b4d2559a7cf76dabc3b0.png
0
4 years ago
avatar
gillianfaith
Level: 1322

I'm able to see underlines on unknown words on desktop, Windows 10, Firefox 97.

Looks like the parser struggles with the ~さそう ending. I only did a quick test, but it seems to consistently parse it as さ / :

input: らないがよさそうだ。
expectation: らない / / が / よさそう / だ / 。
result: らない / / が / よ / さ / / だ / 。

input: けるしかなさそうだね。
expectation: / を / ける / しか / なさそう / だ / ね / 。
result: / を / ける / しか / な / さ / / だ / ね / 。

input: くいられるようなじゃなさそうだ
expectation: く / いられる / よう / な / / じゃ / なさそう / だ (or maybe じゃなさそう, as a form of じゃない? not sure which is the more helpful way to parse)
result: く / いられる / よう / な / / じゃ / な / さ / / だ


Also noting that the 。 in the second test was flagged as not in the dictionary, whereas the 。 in the first test was. I'm assuming because it's half-width?

0
4 years ago
avatar
マイコー
Level: 328

This'll be all I can do tonight! (Not, it does not fixed already parsed readings)

1. さ (adjective > noun) form fixed

2. half-width period fixed

3. メ not in base dictionary (this is not renshuu's dictionary), impossible (for now)

4. ながら fixed

5. ぼっこぼこ not in base

6 . ベテランニート fixed

7. がった fixed

Notes: The "base dictionary" is a layer below renshuu, part of a package called kuromoji. Although renshuu can work around it *sometimes*, if the word is not in this dictionary, it'll get broken up into smaller components that are very hard to stitch and put back together (on renshuu's layer). Looking into a way to inject terms into that base dictionary, but it's always a nightmare working with someone else's code library.

1
4 years ago
avatar
cmertb
Level: 391

ただのきこもりベテランニートだったのだが、

Didn't notice this the first time without the underlines, but there is another issue in that sentence, きこもり got split up.

f734d5de081ae410ea24ef8f55b62dc5232e23519538952d.png

0
4 years ago
avatar
cmertb
Level: 391
きじゃくってきをえようとしたら、のままからされた。

A couple of issues here, although I suspect they both come from kuromoji and can't be fixed now.

1) なきをる didn't get recognized as an expression

2) was interpreted as か rather than いえ

ea9f785957204d30ca13fdb2f796db0263a7947e5d2d32e0.png
0
4 years ago
avatar
cmertb
Level: 391
されたっていき、をうけるわけだ。

got split up into and .

0
4 years ago
avatar
|マルコ|
Level: 110

Original: そうそう、あの、おさななじみのきな……
Parsed: そうの、きな……

the そうそう in the sentence above is split into そう(so, really, seeming) (manor/villa)

0
4 years ago
avatar
マイコー
Level: 328

I'm just tossing this out here - if anyone is proficient in node/js and wants to help me reverse-engineer the dictionary file formats in the base layer that I am using, let me know! (https://github.com/takuyaa/kur...) Been trying to figure out how to add new entries to the underlying dictionary.

0
4 years ago
avatar
gillianfaith
Level: 1322

Came up on the parser not recognizing ~えば endings when they're palatalized, and also deleting some characters in the results.


input: どうすりゃいいだろう?
expectation: どう / すりゃ / いい / だろう / ?
result: どう / / いい / だろう / ?

Parser deletes the ゃ in すりゃ, and mistakes する for る.

input: おさんに、ってかなきゃ!
expectation: さん / に / 、 / ってかなきゃ / !
result: お / さん / に / 、 / って / / なきゃ / !

Parser deletes the from おさん, and splits かなきゃ into & なきゃ, which are both unmatched.

input: けなきゃよかったかな。
expectation: けなきゃ / よかった / かな / 。
result: け / なきゃ / よかった / / 。

Parser splits away なきゃ and fails to match it, and mistakes かな ending for a な-adjective.

0
4 years ago
avatar
マイコー
Level: 328

1. Fixed そうそう

2. is not yet possible with current setup. kuromoji splits it, and both words are in the dictionary. This will be fixed by the future implementation of complex words, where a word can be marked as a subset of a larger word, and the system will try to match those together. 100% doable, just not yet.

3. So, it's marking そりゃ as a verb form of する. Any idea what that's called? I can easily code it into the system, but I'm not sure what it is or what the full rule set is.

4. かなきゃ fixed

5. かな fixed (interestingly, it had this as a form of the based "unit" か.

6. お partially fixed - but no おさん (see #2)

7. なきをる < see #2

8. のままから <-- bad kuromoji marking. This may be tricky - we'll have to see how many more of these come out before we can consider a rule to overrule kuromoji.

I think the compound word issue is going to be the largest one, and one that I may need to implement sooner than later.

In order to give you all something to play with, though, I'll try to get the Actions panel set up soon so you can start exporting this stuff to lessons/schedules.

0
4 years ago
avatar
|マルコ|
Level: 110

1)
Original: ‥‥くそっ! (damn!)
Parsed: ‥‥く

2)
Original: ‥‥まりたくねえ‥‥
ねえ gets recognized as "right?/don't you think?" instead of ない

3)
Original: こと
Original: ったこと
こと always get the translation of "particle indicating command, mild enthusiasm etc" instead of the more common

4)
Original: ‥‥そうだ、だ‥‥
Parsed: ‥‥だ、だ‥‥

5)
Original:(もう、もかも オシマイなんだぁ!)
Parsed: (もう、もかも  いなだぁ!)
If there is katakana used for empashis, foreing accents, robotic voice etc..., maybe the Reader should try to find a way to link い under the hood but leave it display as オシマイ

6)
Original: あそこでんでるの
Parsed:

Contracted Te-iru form turned into でる

7)
Original:‥‥にたがってるわよ。
Parsed:‥‥たがってるわよ。
たがる is failing to being linked

8)
Original: (Yahari - family name)

Names are getting parsed incorrectly, this was splitted into
and - might be the right time to import the same Names Dictionary used by jisho :D

9)
Original: うわあ
Parsed:
うわ
This is being split into 2

10)
Original:わなくちゃ

ちゃ is not recognized as the contraction of ては

0
4 years ago
avatar
gillianfaith
Level: 1322

3. So, it's marking そりゃ as a verb form of する. Any idea what that's called? I can easily code it into the system, but I'm not sure what it is or what the full rule set is

*そりゃ is a blend of それは, not a form of する afaik. It has its own dictionary entry and as far as I've seen gets parsed correctly.

ichi.moe categorizes すりゃ the same as the "provisional" -eba form (link). I think the rule is to just replace れば with りゃ for verbs and ければ with きゃ for adjectives; すれば→すりゃ, なければ→なきゃ .

0
4 years ago
Getting the posts


Page: 1 of 3



Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests


Loading the list
Lv.

Sorry, there was an error on renshuu! If it's OK, please describe what you were doing. This will help us fix the issue.

Characters to show:





Use your mouse or finger to write characters in the box.
■ Katakana ■ Hiragana