UTF8 range for Chinese Շարքի հրապարակողը: Samuel Murray
| Samuel Murray Նիդեռլանդներ Local time: 09:37 Անդամ (2006) անգլերենից աֆրիկանս + ...
Hello everyone
I have a file in which some segments contain Chinese characters. I need to identify these segments, so I'm hoping I can use a search for the specific Unicode characters that are Chinese. Can anyone clarify for me what is the UTF8 character range for Chinese characters?
Thanks
Samuel
Added: found it, under "CJK scripts and symbols" here:... See more Hello everyone
I have a file in which some segments contain Chinese characters. I need to identify these segments, so I'm hoping I can use a search for the specific Unicode characters that are Chinese. Can anyone clarify for me what is the UTF8 character range for Chinese characters?
Thanks
Samuel
Added: found it, under "CJK scripts and symbols" here:
https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
However, I discovered that searching for the presence of all of these characters would be very inefficient, so instead I converted all my source text to one character per line and then removed duplicate lines, to get a list of all characters used in the source text. Then I just deleted non-Chinese characters, and thus had a much smaller list of characters to search (and no need to search hexadecimally either).
[Edited at 2021-05-04 11:00 GMT] ▲ Collapse | | | esperantisto Local time: 10:37 Անդամ (2006) անգլերենից ռուսերեն + ... SITE LOCALIZER
Here is the range that I use (even though you have already found, maybe, it will be handy):
| | | Samuel Murray Նիդեռլանդներ Local time: 09:37 Անդամ (2006) անգլերենից աֆրիկանս + ... TOPIC STARTER @Esperantisto | May 4, 2021 |
esperantisto wrote:
Here is the range that I use (even though you have already found, maybe, it will be handy)...
Thanks, I'll give that a try as well (then I can use regex).
As it happens, my source text contained only about 1000 distinct Chinese characters, so testing for each of them one by one across 2000 segments was doable and took about 20 seconds only (not including the time it took to script it in AutoIt, of course). I'm curious if a regex approach would be quicker (not counting preprocessing time). | | | LIZ LI Չինաստան Local time: 15:37 ֆրանսերենից չինարեն + ...
Here's a free conversion page between UTF8 & Chinese:
https://www.ip138.com/utf8/
Copy & paste the source for Chinese > UTF8 in the UPPER dialog box, then click the 1st green button below;
OR
Copy & paste the source for UTF8 > Chinese in the LOWER dialog box, then click the 2nd green button below.
If you want to do i... See more Here's a free conversion page between UTF8 & Chinese:
https://www.ip138.com/utf8/
Copy & paste the source for Chinese > UTF8 in the UPPER dialog box, then click the 1st green button below;
OR
Copy & paste the source for UTF8 > Chinese in the LOWER dialog box, then click the 2nd green button below.
If you want to do it manually, you may also try http://www.mytju.com/classcode/tools/encode_utf8.asp
[Edited at 2021-05-04 13:24 GMT]
[Edited at 2021-05-04 13:24 GMT] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » UTF8 range for Chinese Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |