When binding sentences to the dictionary, I've noticed some things about how the sentence parser works that are kinda annoying: -It doesn't handle the particle は well at all. Other particles are parsed correctly at least 90% of the time, but about 99% of the time は is parsed as being connected to whatever kana immediately follows it. So if I have a sentence that should be parsed as "(...) (は) (た...)", it ends up as "(...) (はた) (...)" -Issues with ・ symbol: when I made a sentence including the text 「ソード・アート・オンライン」, it wouldn't parse it as one (unmatched) word even after I added parentheses around the whole thing, instead parsing it as "(ソード) (アート・オンライン)". Not sure if using Japanese quotation marks could have caused it. Another problem is that even though the ・ symbol is punctuation, it appears in the word list as unmatched instead of being ignored like commas, periods, etc. -Numbers not written out in kanji/kana aren't linked to vocab entries or ignored, but are instead listed in the word list as unmatched.
The は issue is a lot harder to fix, and would require a significant upgrade to the parser. The other issues - I think I can get them resolved sooner than later!
Can you give me a couple of examples of the は issue that you are running into? I know exactly what you're saying, but I have a cheap (programming-wise) idea about how to fix it most of the time, but I need to genuine examples to test it against.
I would not call it battle tested, but here is the new rule:
a) Word is found b) Beginning of rest of sentence starts with は c)Checks two times: one with は as part of next word, once with は separate, then the next character starting a word. d)It compares the two words that occur in each situation, and suggests the ones that is more prevalent in the dictionary.
Just had another problem binding a sentence. The parser marks (授業中に) as undefined when it is parsed altogether in parentheses, whether or not it is written all in kana or with kanji, even though there is a dictionary entry for it. It only finds a definition when it is parsed as (授業中)(に).
*Edit*: Just tried out a sentence using は followed by 砂漠, and it was parsed correctly, so I think the は parsing has definitely improved.
More issues: -A recurring problem is that even though the standard for most animal/plant names is usually to write them in katakana, the parser will only match them to the correct dictionary entry when the name is written in hiragana or kanji. -For めぐって, the only way to get the primary dictionary entry to be used is to write it in kanji (巡って); if written in kana it defaults to the entry for 回る. There doesn't appear to be a way to get a dropdown for the three entries for めぐる. -For the phrase (Noun)をしている, the first dropdown suggestion for している looks identical to the second one, but using it results in the bound sentence showing すている as the reading for している.
For example, the recent word garden topic requires the word みみず. If it is written in katakana, the parser lists it as unmatched, instead of linking it to the correct dictionary entry. This issue affects plant names as well: I wrote a sentence previously with the word ハナビシソウ, and the parser lists it as unmatched even though there is a dictionary entry for the word. Previously in this thread I posted that ニンジン was unmatched, even though there is a dictionary entry for it in katakana. This might be related to another problem I've noticed in quizzes. I've found that although vocab quiz prompts say to type in hiragana/katakana/romaji, and hiragana answers are accepted for terms normally written in katakana, the opposite is not always true. This seems to be limited to terms which have kanji, but I have noticed that if I am quizzed on 煙草(たばこ) and I type タバコ, or for 缶 if I type カン, the answer is marked incorrect.
Another problem, with なる: for (いadjective)くなる, the dropdown shows two options, neither with kanji, but depending on what is chosen the result will be 生る or 為る; the entry for 成る does not appear. For (いadjective)くなっていく, the dropdown shows three options, and shows the kanji for 行く, but again doesn't show the kanji for なる, and again, none of the options link to 成る (two of three options are 生る and one is 為る).
Thanks, I'll look into it soon. A preliminary search into the kata/hira option shows that it is going to most likely be quite difficult to implement without major changes due to the way the site tries to break up the words in the sentence. (Would take too long to explain here). I'll be thinking about it, but it might be difficult
I cannot replicate the なる issue - can you give me the exact sentence you are trying to enter so I can make sure I see the problem as you see it? Thanks!
Aaaand now the parser no longer recognizes だから as a word. I can parse it as (だ)(から) and get definitions for each part separately, but (だから) is unmatched despite having a dictionary entry.
I think I finally found the root of this issue (and hopefully didn't break something else). Let's see if the なる issue still pops up.
だから has always been unavailable in the parser. I just enabled it, but I do not plan on doing this for any other terms. There are a number of "complex" terms in the dictionary that don't appear in the parser because they are composed of smaller grammar elements (such as から in this situation). I wanted the sentences to be bound to the most basic grammar element, since だから, while in most dictionaries, is not a "word" in the sense that I want it to be represented.
Please let me know if you see some improvement with the parser!