renshuu.org requires Javascript to work correctly. Please enable Javascript and reload this page.

renshuu requires cookies to work correctly. Please enable cookies and reload this page.

掲示板 Forums - Sentence parser problems

Top > renshuu.org > Bugs / Problems > Resolved Bugs

htatsuha

Level: 1147

When binding sentences to the dictionary, I've noticed some things about how the sentence parser works that are kinda annoying:
-It doesn't handle the particle は well at all. Other particles are parsed correctly at least 90% of the time, but about 99% of the time は is parsed as being connected to whatever kana immediately follows it. So if I have a sentence that should be parsed as "(...) (は) (た...)", it ends up as "(...) (はた) (...)"
-Issues with ・ symbol: when I made a sentence including the text 「ソード・アート・オンライン」, it wouldn't parse it as one (unmatched) word even after I added parentheses around the whole thing, instead parsing it as "(ソード) (アート・オンライン)". Not sure if using Japanese quotation marks could have caused it. Another problem is that even though the ・ symbol is punctuation, it appears in the word list as unmatched instead of being ignored like commas, periods, etc.
-Numbers not written out in kanji/kana aren't linked to vocab entries or ignored, but are instead listed in the word list as unmatched.

6 years ago

Report Content

マイコー

Level: 261

The は issue is a lot harder to fix, and would require a significant upgrade to the parser. The other issues - I think I can get them resolved sooner than later!

6 years ago

Report Content

htatsuha

Level: 1147

Cool, thanks!

6 years ago

Report Content

マイコー

Level: 261

The two smaller issues should be fixed!

6 years ago

Report Content

マイコー

Level: 261

Can you give me a couple of examples of the は issue that you are running into? I know exactly what you're saying, but I have a cheap (programming-wise) idea about how to fix it most of the time, but I need to genuine examples to test it against.

6 years ago

Report Content

htatsuha

Level: 1147

Here are some sentences I've written on the site that by default have は misparsed:

一番好きな和食は多分鰻重です！
-(は)(たぶん) in kana gets parsed as (旗)(文)

大学の時、同級生は皆「難し過ぎ！」と言うような事を言ったが、私は余り難しくないと思った。
-(は)(みんな) in kana gets parsed as (食み)(ん)(な)

ケーキはもう全部食べて仕舞った。
-(は)(もう) gets parsed as (はもう)

私は未だ運転している。
-(は)(まだ) in kana gets parsed as (浜)(だ)

6 years ago

Report Content

htatsuha

Level: 1147

Another issue I've just had: the parser cannot recognize ニンジン as "carrot", only にんじん.

6 years ago

Report Content

マイコー

Level: 261

I see. Hmm.. let me think on it. I usually come up with crazy good ideas if I stew on issues like this a bit.

6 years ago

Report Content

htatsuha

Level: 1147

Good luck :)

6 years ago

Report Content

マイコー

Level: 261

I would not call it battle tested, but here is the new rule:

a) Word is found
b) Beginning of rest of sentence starts with は
c)Checks two times: one with は as part of next word, once with は separate, then the next character starting a word.
d)It compares the two words that occur in each situation, and suggests the ones that is more prevalent in the dictionary.

6 years ago

Report Content

htatsuha

Level: 1147

Just had another problem binding a sentence. The parser marks (授業中に) as undefined when it is parsed altogether in parentheses, whether or not it is written all in kana or with kanji, even though there is a dictionary entry for it. It only finds a definition when it is parsed as (授業中)(に).

*Edit*: Just tried out a sentence using は followed by 砂漠, and it was parsed correctly, so I think the は parsing has definitely improved.

6 years ago

Report Content

マイコー

Level: 261

Both word issues should be resolved.

6 years ago

Report Content

htatsuha

Level: 1147

More issues:
-A recurring problem is that even though the standard for most animal/plant names is usually to write them in katakana, the parser will only match them to the correct dictionary entry when the name is written in hiragana or kanji.
-For めぐって, the only way to get the primary dictionary entry to be used is to write it in kanji (巡って); if written in kana it defaults to the entry for 回る. There doesn't appear to be a way to get a dropdown for the three entries for めぐる.
-For the phrase (Noun)をしている, the first dropdown suggestion for している looks identical to the second one, but using it results in the bound sentence showing すている as the reading for している.

6 years ago

Report Content

マイコー

Level: 261

A few examples, if you don't mind. Thanks!
Ok - this appears to be an error buried deep in code I haven't touched in years. Might take some time.
I think this is fixed!

6 years ago

Report Content

htatsuha

Level: 1147

For example, the recent word garden topic requires the word みみず. If it is written in katakana, the parser lists it as unmatched, instead of linking it to the correct dictionary entry. This issue affects plant names as well: I wrote a sentence previously with the word ハナビシソウ, and the parser lists it as unmatched even though there is a dictionary entry for the word. Previously in this thread I posted that ニンジン was unmatched, even though there is a dictionary entry for it in katakana. This might be related to another problem I've noticed in quizzes. I've found that although vocab quiz prompts say to type in hiragana/katakana/romaji, and hiragana answers are accepted for terms normally written in katakana, the opposite is not always true. This seems to be limited to terms which have kanji, but I have noticed that if I am quizzed on 煙草(たばこ) and I type タバコ, or for 缶 if I type カン, the answer is marked incorrect.

6 years ago

Report Content

htatsuha

Level: 1147

Another problem, with なる: for (いadjective)くなる, the dropdown shows two options, neither with kanji, but depending on what is chosen the result will be 生る or 為る; the entry for 成る does not appear. For (いadjective)くなっていく, the dropdown shows three options, and shows the kanji for 行く, but again doesn't show the kanji for なる, and again, none of the options link to 成る (two of three options are 生る and one is 為る).

6 years ago

Report Content

マイコー

Level: 261

Thanks, I'll look into it soon. A preliminary search into the kata/hira option shows that it is going to most likely be quite difficult to implement without major changes due to the way the site tries to break up the words in the sentence. (Would take too long to explain here). I'll be thinking about it, but it might be difficult

I cannot replicate the なる issue - can you give me the exact sentence you are trying to enter so I can make sure I see the problem as you see it? Thanks!

6 years ago

Report Content

htatsuha

Level: 1147

The なる problem occurred for the following 2 haiku I made:

夏なのに
寒く為ってる。
秋が来る？

立ち込めて
濃く為って行く
秋の霧

(I originally wrote なる in hiragana)

6 years ago

Report Content

htatsuha

Level: 1147

Aaaand now the parser no longer recognizes だから as a word. I can parse it as (だ)(から) and get definitions for each part separately, but (だから) is unmatched despite having a dictionary entry.

6 years ago

Report Content

マイコー

Level: 261

I think I finally found the root of this issue (and hopefully didn't break something else). Let's see if the なる issue still pops up.

だから has always been unavailable in the parser. I just enabled it, but I do not plan on doing this for any other terms. There are a number of "complex" terms in the dictionary that don't appear in the parser because they are composed of smaller grammar elements (such as から in this situation). I wanted the sentences to be bound to the most basic grammar element, since だから, while in most dictionaries, is not a "word" in the sense that I want it to be represented.

Please let me know if you see some improvement with the parser!

6 years ago

Report Content

Getting the posts

Top > renshuu.org > Bugs / Problems > Resolved Bugs

和英辞典Vocabulary dictionary

Filters

漢字辞典 Kanji dictionary

Filters

文法辞典 Grammar dictionary

Filters

例文検索 Sentence lookup

掲示板 Forums - Sentence parser problems