Vocabulary dictionary

Kanji dictionary

Grammar dictionary

Sentence lookup

test
 

Forums - [HELP] Japanese font caching

Top > renshuu.org > Developer corner



avatar
マイコー
Level: 265

The majority of renshuu's Japanese fonts come from a Japanese font maker's API called typesquare.com (English page available). They do have a free usage tier that doesn't require payment if you want to try setting it up on your own.

Except for the highest tier (which offers access to their WebAPI), everything is piped through their Javascript API.

Reference here: https://typesquare.com/service...

The API is designed for static page websites, with the idea that you have a page of text, you call the Javascript API, and it'll scan the page for characters, then download the chunk of the font needed for that page.

renshuu, on the other hand, frontloads this by having a temporary div with all of the Japanese characters in it, which results in about 4-5 calls to the api. You can easily see these in the web inspector in a web browser.

It then injects 4-5 style tags into the head of the document (this forum is not set up to handle code snippets, but you should find these if you search for elements in the document with the ts-font class)

So, here's what I'd like to do (I've already received permission to do this from typesquare).

Once it's fully loaded, I'd love for a way to cache this font data into localStorage, then use that instead when present (saving a good chunk of loading when the app/website is opened for the first time each day).

However, I have not been able to take the above font-face and get it into a form that can be saved, much less combine the 4-5 of them into a single file.


2
1 month ago
Report Content
avatar
マイコー
Level: 265

Update: there is a new experimental toggle in Settings > Experimental that allows for decent caching. I was able to switch to a slightly different API on the font service thanks to the plan I am in, and it can request the font data in base64, which I can then save.

Only downside at the moment is using localStorage, I'm capped at 5mb on most devices, which is not enough space for a single font (so I can get most of the font cached, dropping the number of network calls from 5 to 1).

I've looked into an IndexedDB implementation, but I feel that it could get a bit messy with the async processes calling the data back out.

2
1 month ago
Report Content
avatar

Some of these details are over my head, but it seems like you should be able to get by with a much smaller fraction of the font than most of it. The smallest font I could see on their website was over 6000 characters. The average renshuu user would be unlikely to use more than 50% of them, or so it seems to me.

2
1 month ago
Report Content
avatar
マイコー
Level: 265

That's very true. However, I cannot yet say how complicated/slow it would be to do the following:

1. Load up the "main characters" (maybe 50% of the font)

2. For everything that gets sent to the app/browser, check to see if it contains the less common characters. If so, send a flag that says "download the rest of it."

Just off the top of my head, the biggest concern would be this check for the characters. Without saying "hey, are any of these 3000 characters in the text I'm about to send over", you'd ideally want to regex on a unicode range. To give a simple example, if the common characters are x0000-x5000, and the rare ones are x5001-x9999, it should scan the test for anything in that range. That's probably not particularly expensive in terms of time.

However, my gut tells me that the kanji are not necessarily arrange in the unicode spec based on frequency, so there is probably not a good way to scan for rare kanji aside from something like regex expression that's /[3000 kanji here]/.


2
1 month ago
Report Content
avatar
Anonymous123
Level: 1213

For the purpose of testing assumptions on kanji frequency:

This appears to be a list of the kanji based on their frequency (based on Japanese Wikipedia articles) with their unicode values: https://github.com/scriptin/ka...

1
1 month ago
Report Content
avatar
マイコー
Level: 265

Yea, that's what I was afraid of - there is no order whatsoever :(. I'll need to do some test runs to see how much time it actually takes.

Would be nice to only download 1/3 of the font, though.

2
1 month ago
Report Content
avatar
Anonymous123
Level: 1213

One approach to possibly consider is:

-take 1st 1000 or so most frequent kanji from that list, and put that in the div for front loading. According to the supporting site graphs, that will give you coverage of 90%+

-have a map/set with the unicodes for those 1000 kanji

-do a linear scan of the document text to check for membership of each character in the set (should only take linear time, so it should be fast)

-as soon as the linear scan encounter a kanji not in the set (should happen much less than 10% of the time), just call the Javascript API to scan the whole page and load the appropriate fonts

That should be fast for the majority of pages, but a bit slower for complex pages

You could even up the number of kanji in the div to lower the frequency of times it ends up finding a missed kanji

1
1 month ago
Report Content
avatar
マイコー
Level: 265

A possibility, yea. Not sure if I'll get to it this week, but I can probably turn on a quick scanner on those who are opted into the current experimental setting, then do some logging on the speeds for those scans. If it's under .05 seconds, then that's fine.

1
1 month ago
Report Content
avatar

The original organization of the basic CJK block was by radical, with increasing stroke counts getting higher values. As an experiment, taking the low half would be an easy thing to try.

1
1 month ago
Report Content
avatar
マイコー
Level: 265

Ok, so bad news/good news: PHP (what renshuu's server code is written in) does not have a good way of doing the type of character scanning that we need, at least for multibyte strings. Regex is *way* too slow for this purpose.

With the additional complexity and higher chance of errors popping up with fonts not getting downloaded, I'm going to switch back to the original plan of a full cache. However, going to try switching off of localStorage to indexedDB, as that should cover the amount of space we need.

1
1 month ago
Report Content
avatar
Anonymous123
Level: 1213

Would using mb_str_split and iterating over the resulting array not work for scanning ?

0
1 month ago
Report Content
avatar
マイコー
Level: 265

The resulting html is WAY too long to do that - it could be several thousand chars long for some pages. Splitting it into an array than iterating is too much.

Anyway, I just uploaded a new version that used IndexedDB, seems to work really well.

1
1 month ago
Report Content
avatar
パリパリ
Level: 139

If you ideally want to have the entire font available, and are okay with around a 5mb download (as you note the 5mb limit of localstorage to be a limitation), have you considered just serving the full font file as a static file, rather than using an API to fetch specific characters?

NotoSans Japanese for example has a variable font available which includes italics and all font weights at 5.6mb gzipped. Or if you don't need italics and font weights, the non-variable font is 3.2mb gzipped.

It's unclear to me what frontend framework you're using, but if you're doing server-side rendering with templating, you could conditionally template in the css to load whichever font the user has selected in their settings.

If you convert it to woff2 format (which all major browsers support now that IE11 is dead), you can get it down to 4.1mb for the variable font, or 2.2mb for non-variable.

0
1 month ago
Report Content
avatar
マイコー
Level: 265

Serving it as a static file is against the license terms for TypeSquare, the service I use for most of the fonts.

So far, though, saving the base64 in indexedDB is working out really well, so I feel like I have a good solution going.

0
1 month ago
Report Content
Getting the posts




Top > renshuu.org > Developer corner


Loading the list
Lv.

Sorry, there was an error on renshuu! If it's OK, please describe what you were doing. This will help us fix the issue.

Characters to show:





Use your mouse or finger to write characters in the box.
■ Katakana ■ Hiragana