Vocabulary dictionary

Kanji dictionary

Grammar dictionary

Sentence lookup

test
 

Forums - Sentence parser problems

Top > renshuu.org > Bugs / Problems > Resolved Bugs

Page: 1 of 8



avatar
htatsuha
Level: 1147
When binding sentences to the dictionary, I've noticed some things about how the sentence parser works that are kinda annoying:
-It doesn't handle the particle は well at all. Other particles are parsed correctly at least 90% of the time, but about 99% of the time は is parsed as being connected to whatever kana immediately follows it. So if I have a sentence that should be parsed as "(...) (は) (た...)", it ends up as "(...) (はた) (...)"
-Issues with ・ symbol: when I made a sentence including the text 「ソード・アート・オンライン」, it wouldn't parse it as one (unmatched) word even after I added parentheses around the whole thing, instead parsing it as "(ソード) (アート・オンライン)". Not sure if using Japanese quotation marks could have caused it. Another problem is that even though the ・ symbol is punctuation, it appears in the word list as unmatched instead of being ignored like commas, periods, etc.
-Numbers not written out in kanji/kana aren't linked to vocab entries or ignored, but are instead listed in the word list as unmatched.
0
6 years ago
Report Content
avatar
マイコー
Level: 261
The は issue is a lot harder to fix, and would require a significant upgrade to the parser. The other issues - I think I can get them resolved sooner than later!
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Cool, thanks!
0
6 years ago
Report Content
avatar
マイコー
Level: 261
The two smaller issues should be fixed!
0
6 years ago
Report Content
avatar
マイコー
Level: 261
Can you give me a couple of examples of the は issue that you are running into? I know exactly what you're saying, but I have a cheap (programming-wise) idea about how to fix it most of the time, but I need to genuine examples to test it against.
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Here are some sentences I've written on the site that by default have は misparsed:

きなです!
-(は)(たぶん) in kana gets parsed as ()()

ぎ!」とうようなったが、しくないとった。
-(は)(みんな) in kana gets parsed as (み)(ん)(な)

ケーキはもうべてった。
-(は)(もう) gets parsed as (はもう)

している。
-(は)(まだ) in kana gets parsed as ()(だ)
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Another issue I've just had: the parser cannot recognize ニンジン as "carrot", only にんじん.
0
6 years ago
Report Content
avatar
マイコー
Level: 261
I see. Hmm.. let me think on it. I usually come up with crazy good ideas if I stew on issues like this a bit.
1
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Good luck :)
0
6 years ago
Report Content
avatar
マイコー
Level: 261
I would not call it battle tested, but here is the new rule:

a) Word is found
b) Beginning of rest of sentence starts with は
c)Checks two times: one with は as part of next word, once with は separate, then the next character starting a word.
d)It compares the two words that occur in each situation, and suggests the ones that is more prevalent in the dictionary.
1
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Just had another problem binding a sentence. The parser marks (に) as undefined when it is parsed altogether in parentheses, whether or not it is written all in kana or with kanji, even though there is a dictionary entry for it. It only finds a definition when it is parsed as ()(に).

*Edit*: Just tried out a sentence using は followed by , and it was parsed correctly, so I think the は parsing has definitely improved.
0
6 years ago
Report Content
avatar
マイコー
Level: 261
Both word issues should be resolved.
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
More issues:
-A recurring problem is that even though the standard for most animal/plant names is usually to write them in katakana, the parser will only match them to the correct dictionary entry when the name is written in hiragana or kanji.
-For めぐって, the only way to get the primary dictionary entry to be used is to write it in kanji (って); if written in kana it defaults to the entry for る. There doesn't appear to be a way to get a dropdown for the three entries for めぐる.
-For the phrase (Noun)をしている, the first dropdown suggestion for している looks identical to the second one, but using it results in the bound sentence showing すている as the reading for している.
0
6 years ago
Report Content
avatar
マイコー
Level: 261
  1. A few examples, if you don't mind. Thanks!
  2. Ok - this appears to be an error buried deep in code I haven't touched in years. Might take some time.
  3. I think this is fixed!
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
  1. For example, the recent word garden topic requires the word みみず. If it is written in katakana, the parser lists it as unmatched, instead of linking it to the correct dictionary entry. This issue affects plant names as well: I wrote a sentence previously with the word ハナビシソウ, and the parser lists it as unmatched even though there is a dictionary entry for the word. Previously in this thread I posted that ニンジン was unmatched, even though there is a dictionary entry for it in katakana. This might be related to another problem I've noticed in quizzes. I've found that although vocab quiz prompts say to type in hiragana/katakana/romaji, and hiragana answers are accepted for terms normally written in katakana, the opposite is not always true. This seems to be limited to terms which have kanji, but I have noticed that if I am quizzed on (たばこ) and I type タバコ, or for if I type カン, the answer is marked incorrect.
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Another problem, with なる: for (いadjective)くなる, the dropdown shows two options, neither with kanji, but depending on what is chosen the result will be る or る; the entry for る does not appear. For (いadjective)くなっていく, the dropdown shows three options, and shows the kanji for く, but again doesn't show the kanji for なる, and again, none of the options link to る (two of three options are る and one is る).
0
6 years ago
Report Content
avatar
マイコー
Level: 261
Thanks, I'll look into it soon. A preliminary search into the kata/hira option shows that it is going to most likely be quite difficult to implement without major changes due to the way the site tries to break up the words in the sentence. (Would take too long to explain here). I'll be thinking about it, but it might be difficult

I cannot replicate the なる issue - can you give me the exact sentence you are trying to enter so I can make sure I see the problem as you see it? Thanks!
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
The なる problem occurred for the following 2 haiku I made:

なのに
ってる。
る?

めて
って


(I originally wrote なる in hiragana)
0
6 years ago
Report Content
avatar
htatsuha
Level: 1147
Aaaand now the parser no longer recognizes だから as a word. I can parse it as (だ)(から) and get definitions for each part separately, but (だから) is unmatched despite having a dictionary entry.
0
6 years ago
Report Content
avatar
マイコー
Level: 261
I think I finally found the root of this issue (and hopefully didn't break something else). Let's see if the なる issue still pops up.

だから has always been unavailable in the parser. I just enabled it, but I do not plan on doing this for any other terms. There are a number of "complex" terms in the dictionary that don't appear in the parser because they are composed of smaller grammar elements (such as から in this situation). I wanted the sentences to be bound to the most basic grammar element, since だから, while in most dictionaries, is not a "word" in the sense that I want it to be represented.

Please let me know if you see some improvement with the parser!
0
6 years ago
Report Content
Getting the posts


Page: 1 of 8



Top > renshuu.org > Bugs / Problems > Resolved Bugs


Loading the list
Lv.

Sorry, there was an error on renshuu! If it's OK, please describe what you were doing. This will help us fix the issue.

Characters to show:





Use your mouse or finger to write characters in the box.
■ Katakana ■ Hiragana