Online text analysis tools
Thread poster: Daniel Frisano
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 20:17
Member (2008)
English to Italian
+ ...
Jan 4, 2023

For the sake of transparency towards my clients, I'd like to find some online tool for text analysis that returns wordcount, repetitions, fuzzy matches, etc., as in a report from memoQ or other CAT tools.

Any ideas?

(Tried and tested solutions welcome, some genius suggesting "Try googling/visiting/asking etc." NOT welcome)


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:17
Member (2006)
English to Afrikaans
+ ...
I can only Google, sorry Jan 5, 2023

Daniel Frisano wrote:
I'd like to find some online tool for text analysis that returns wordcount, repetitions, fuzzy matches, etc., as in a report from memoQ or other CAT tools.


For simple word counting only:

- https://www.kennistranslations.com/wordcount
Also counts repetitions. Only works for HTML, TXT and Office 2007+ files. Doesn't require logging in.

- https://docwordcounter.com/en/
Doesn't count repetitions. I'm unsure what formats are supported.

Since you want fuzzy match counting, a CAT tool might be a solution.

WFA is no longer free (your client would have to pay $1 for access, and he needs to be able to follow instructions).

Both Matecat and Smartcat are free and offer TMX importing, so presumably you should be able to do fuzzy word counting with both of them, but I haven't tested it myself.

But are you sure most of your clients would be happy that you upload their files to a third-party site? You know, confidentiality and all that?

OmegaT is free but it's a large download (about 200 MB). Instructions for OmegaT are relatively simple (a recorded video should do it). The resulting word count can be pasted into Excel directly, but you'd still have to tell the client what weights you use for the various fuzzy match bands.

The only offline tool specifically meant for word counting that I know of is AnyCount, but beyond the 30-day trial they charge about $200 for it, and it doesn't offer fuzzy match counting anyway.

[Edited at 2023-01-05 11:31 GMT]


expressisverbis
Stepan Konev
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:17
English to Russian
@Samuel Murray Jan 5, 2023

Samuel Murray wrote:
Google is your friend *duck*
Why do you do that? Genius suggesting "Try googling..." NOT welcome Let the TS wait for a qulified answer.


expressisverbis
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:17
Member (2006)
English to Afrikaans
+ ...
@Stepan Jan 5, 2023

Although I used Google myself (this is what I meant), I did test all the options that I felt were possibly suitable. That's different from "suggesting that he uses Google". I realize that the English expression "tried and tested" usually implies that a product or procedure has been approved or has been shown to be a resounding success, but I believe my response passes the bar set by the OP.

[Edited at 2023-01-05 11:40 GMT]


Stepan Konev
Sebastian Witte
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
PDF Jan 5, 2023

Samuel Murray wrote:

- https://www.kennistranslations.com/wordcount
Also counts repetitions. Only works for HTML, TXT and Office 2007+ files.


Nice counter. I just tested it successfully with a PDF document.


Maria Teresa Borges de Almeida
Stepan Konev
 
Daryo
Daryo
United Kingdom
Local time: 19:17
Serbian to English
+ ...
MS word? Jan 5, 2023

MS Word is good enough for me.

Option "tools", "word count" gives me all I need (Total number of characters, total number of words etc.)

As for "fuzzy matches" and "repetitions" or anything else of the same kind, my immediate reaction to any mention of that is "please refrain from further wasting my time."


Maria Teresa Borges de Almeida
Philip Lees
expressisverbis
 
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 20:17
Member (2008)
English to Italian
+ ...
TOPIC STARTER
  Jan 6, 2023

Samuel Murray wrote:
But are you sure most of your clients would be happy that you upload their files to a third-party site? You know, confidentiality and all that?


Yes, it's mostly for patents, which are available to the public anyway.

I am already aware of all the solutions everybody has suggested, including the ubergenius that suggested MS Word (how brilliant indeed, surely nobody had ever thought of that before!)

Guess I'll have to find a way to include this functionality in my own website.


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:17
English to Russian
Lack of information in OP Jan 7, 2023

Daniel Frisano wrote:
I am already aware of all the solutions everybody has suggested, including the ubergenius that suggested MS Word (how brilliant indeed, surely nobody had ever thought of that before!)
This is how you say thank you in Italian, isn't it? But why didn't you list the solutions you were already aware of in your original post for others not to bother you with their obvious suggestions?


Mario Cerutti
Hans Lenting
expressisverbis
Platary (X)
 
Daniel Frisano
Daniel Frisano  Identity Verified
Italy
Local time: 20:17
Member (2008)
English to Italian
+ ...
TOPIC STARTER
  Jan 8, 2023

Stepan Konev wrote:
But why didn't you list the solutions you were already aware of in your original post for others not to bother you with their obvious suggestions?


Because it would be a loooooong and boring list, and it would have gone to waste anyway, since none of the suggestions meets the criteria that I clearly listed. All the information you need is there: total wordcount, repetitions and fuzzies, with report generation.

MS Word actually meets none, except for a dumb wordcount, and it's such an obvious and trivial solution that no one should seriously consider suggesting it for longer than a nanosecond. Talk about uselessness.

Back to work now, hoping this thread will die peacefully. Cheers!


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:17
Member (2006)
English to Afrikaans
+ ...
MateCat Jan 8, 2023

Okay, I had a look at MateCat, and it goes a long way.

Obviously you'd have to give your client instructions on how to use MateCat (i.e. record a video), and you'd have to tell them to ignore the sales talk shown by MateCat (e.g. about how much you can save on the translation and how much is "payable"), since MateCat is trying to use the analysis to convince the user to use their ow
... See more
Okay, I had a look at MateCat, and it goes a long way.

Obviously you'd have to give your client instructions on how to use MateCat (i.e. record a video), and you'd have to tell them to ignore the sales talk shown by MateCat (e.g. about how much you can save on the translation and how much is "payable"), since MateCat is trying to use the analysis to convince the user to use their own agency's services.

https://www.matecat.com

You don't need to be logged in to use MateCat to perform an analysis.

(In fact, you don't need to be logged in to use MateCat for the translation process either, if you intend to use MateCat for the translation work. It generates a very long random-looking URL, and anyone using that URL can access the translation. Logging in just gives you access to additional features.)

You do need to click on More Settings, and then untick the MyMemory option under both the Translation Memory and Glossary and Machine Translation tabs. If you want fuzzy analysis against your own TM, you can add your own TM in the Translation Memory and Glossary tab of the settings: click New Resource, then Confirm, then Import TMX, and then the analysis will use your own TM. Regardless of whether you add your own TM, the analysis includes internal fuzzy matches.

When you're done editing the settings, click the "x" (or click away), optionally select the source language, and then drag and drop the file. It is on this screen where MateCat will try to convince the user to use their own agency. Just use the Download Analysis Report link to download the analysis. Unfortunately, it uses spaces instead of tabs. You can also click Show Details link underneath the file name, which shows the analysis on screen (unfortunately not in a format that you can copy/paste).

If you don't want your client to do all these steps, you can copy the URL of the analysis and share that with your client, and they'll be able to access this same information. (Of course, this assumes that your client trusts that you did not make changes to the file before uploading it for analysis.) But this helps if you can tell the client "here's an online CAT tool that returns an analysis very similar to the one I sent you previously".
Collapse


 
Lieven Malaise
Lieven Malaise
Belgium
Local time: 20:17
Member (2020)
French to Dutch
+ ...
Asking for a friend. Jan 9, 2023

Daniel Frisano wrote:
I am already aware of all the solutions everybody has suggested, including the ubergenius that suggested MS Word (how brilliant indeed, surely nobody had ever thought of that before!)


Do you get a lot of invitations to parties ?


expressisverbis
Stepan Konev
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Online text analysis tools






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »