Pages in topic: [1 2] > | MemoQ - Why should it count less words than Trados? Thread poster: Tomás Cano Binder, BA, CT
|
I am evaluating MemoQ extensively and am very happy about the tool in general, but I spotted something that does not seem to be right: apparently, MemoQ counts 2-5% words less than Trados. Yes, this could be great news for agencies, but for freelancers it could have an important bottom-line impact. I initially spotted this on TTX files, but a simulation done with an actual file reveals that --as far as I can see-- MemoQ does grab everything from the presegmented TTX file: if I copy ... See more I am evaluating MemoQ extensively and am very happy about the tool in general, but I spotted something that does not seem to be right: apparently, MemoQ counts 2-5% words less than Trados. Yes, this could be great news for agencies, but for freelancers it could have an important bottom-line impact. I initially spotted this on TTX files, but a simulation done with an actual file reveals that --as far as I can see-- MemoQ does grab everything from the presegmented TTX file: if I copy source to target, confirm everything, and export to TMX, the memory in Trados does translate the full TTX file in TagEditor, with some slight differences in segmentation. So it looks like I do get all segments to translate in MemoQ. Now I am seeing that it happens with all sorts of files: MemoQ counts less words than Trados in medium to large files. Do the following experiment with freely accessible files: - Go to any medium-to-large Wikipedia article. Save as an HTML file. - Analyse with Trados. - Add the file to a project in MemoQ, select it and use the Statistics function. The difference with the files (TTX, Word, PowerPoint, HTML files) I have tried ranges from 2% to 5% approximately. For instance, for the "Spain" article in the English wikipedia, I count 20,807 words in Trados, but only 19,275 words in MemoQ. That is a difference of 1,532 words, which is quite a lot. For business reasons, it only leaves me the option to use Trados for counting, which adds a step to any project and makes me lose time. Is there a reason why the wordcount should be so different between Trados and MemoQ? ▲ Collapse | | | Words with apostrophe | Sep 21, 2009 |
Tomás Cano Binder, CT wrote: Now I am seeing that it happens with all sorts of files: MemoQ counts less words than Trados in medium to large files.[/quote] MemoQ counts less word in very small files too E.g. create a document with one short sentence like I'm stuck. Word counts 2 words. Trados counts 3 words. MemoQ counts 2 words, like Word. Is there a reason why the wordcount should be so different between Trados and MemoQ? The words with apostrophe are counted in a different way. IMHO Trados is more logical here. Cheers GG
[Edited at 2009-09-21 08:44 GMT] | | | what is a word anyway? :) | Sep 21, 2009 |
Hello Tomas, Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document. For example, "I'm" or "a/c" is considered one word in MemoQ, but I beleive both are two words for Trados... See more Hello Tomas, Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document. For example, "I'm" or "a/c" is considered one word in MemoQ, but I beleive both are two words for Trados. There is also the fact that the two tools might actually not extract the exact same text content from documents. For example Trados TagEditor skips number only segments, which is very wrong in my opinion, because in many cases, the source and the target language uses different number formats. Tools may or may not import a generated table of contents, or an index. Import options and their defaults can also affect this. Best regards, Gergely Vandor Kilgray ▲ Collapse | | | OK, but there must be other reasons | Sep 21, 2009 |
Grzegorz Gryc wrote: Word counts 2 words. Trados counts 3 words. MemoQ counts 2 words, like Word. I see! Thanks a lot for the note Grzegorz. However, the "Spain" article in Wikipedia contains about 290 words with apostrophes, so there are still a difference of 1,200 words unexplained. There must be other reasons. I wonder whether it would be possible for MemoQ to emulate Trados' way of counting if we select the TRADOS-like radio button in the Statistics dialog? That way we would not need to use Trados to analyse files for our customers using Trados! | |
|
|
Dashes, hyphens... | Sep 21, 2009 |
Tomás Cano Binder, CT wrote: Grzegorz Gryc wrote: Word counts 2 words. Trados counts 3 words. MemoQ counts 2 words, like Word. I see! Thanks a lot for the note Grzegorz. However, the "Spain" article in Wikipedia contains about 290 words with apostrophes, so there are still a difference of 1,200 words unexplained. There must be other reasons. Dashes. E.g., the Catalan donar-te'ls-hi is 1 (one) word for Word and MemoQ and 4 (four) for Trados. I wonder whether it would be possible for MemoQ to emulate Trados' way of counting if we select the TRADOS-like radio button in the Statistics dialog? That way we would not need to use Trados to analyse files for our customers using Trados! Just ask the developers to correct the bug It should work... Quoting MemoQ help Word counts area: · MemoQ: Check this checkbox to display MemoQ word counts. Note: In MemoQ, similarly to Microsoft® Excel®, every string or character that is between whitespaces is counted as a word. Therefore in MemoQ mode you always count numbers as a single word and hyphenated words like in-bound are also considered to be a single word. · TRADOS-like: Check this checkbox to display Trados-like word counts. SDL Trados® is another CAT tool on the market that handles word counts differently. Note: In Trados®, numbers are only counted as words when they are within a segment, hyphenated words are counted as two words, and a number of other rules apply. In Trados®, segmentation is a factor in word count, i.e. you can get a different word count if the same text appears in one or two lines. Trados® segmentation rules are not public, therefore there is usually a small discrepancy between the word counts of Trados® and Trados-mode MemoQ. In most of the cases, this discrepancy does not exceed 1.5%. We suggest that you only use Trados-like word counts if your client explicitly requires you to do so. Cheers GG | | | Looks like a bug indeed! | Sep 21, 2009 |
Grzegorz Gryc wrote: Just ask the developers to correct the bug It should work... Indeed I will. I certainly believe this is a bug in my edition, as counting a large file with MemoQ style and with Trados style yields the same number of total words and the wordcount should be different in a medium-to-large file. Thank you so much Grzegorz! | | | It's about business, not about translation effort | Sep 21, 2009 |
Gergely Vandor wrote: Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document. I entirely agree Gergely and I appreciate your information. To me this is really a business matter, as some of my customers want me to send them an analysis of the files. I would very much prefer to send them figures as Trados calculates them, which are higher and will match what my TRADOS-based customers get with their tool. If MemoQ could mimmick Trados' counting practices as much as possible (when the TRADOS-like wordcount is enabled in the Statistics dialog box), it would help me, and probably a lot of other users who have TRADOS-based customers. | | | A reply from Kilgray's support | Sep 21, 2009 |
I emailed about this to Kilgray's support at 12:08 today. The reply just arrived at 13:08. That is one hour. Even if the matter is not resolved immediately, I think this is great support. Thank you very much! | |
|
|
Non breaking spaces... LOL... :) | Sep 21, 2009 |
Gergely Vandor wrote: Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document. True. The wordcount may be different even in different versions of the same software. PS. The case of non breaking spaces is funny In MemoQ and Trados 2006, a text like "Pussy cat" with nbsp is reported as one word. Trados 2007 Suite counts it already as two words... Cheers GG | | | does memoQ count less or more? | Sep 30, 2009 |
Hello Tomás, Tomás Cano Binder, CT wrote: If MemoQ could mimmick Trados' counting practices as much as possible (when the TRADOS-like wordcount is enabled in the Statistics dialog box), it would help me, and probably a lot of other users who have TRADOS-based customers. You could use Trados for quoting if you absolutely prefer the Trados results. I"ve just seen an earlier complaint from a prospective customer about memoQ consistently counting more words than Word or Trados, for the material they tested. Countng words is far less than obvious, every tool does it differently. I'm not sure I see how a Trados-like count could be "better". MemoQ in fact takes a simplistic approach, and basically defines a word as a string of characters separated by whitespace. This can be less or more than what Trados counts. This leads to another question: isn't it way easier to measure effort by counting characters instead? Also, I find it alarming how readily people accept these "CAT wordcounts" as "precise" basis for a quotation, or payment to a translator. The quality of the TMs and terminology (if there is any) is just as important. Not to mention the quality of the source material: how many tags will I encounter? Is it over-formatted? Are there many typos in the source? Are there editing problems preventing sane segmentation? Is it easy to see where and how the segments will turn up in the final document? Best regards, Gergely | | | EJZ Local time: 18:12 Polish to English + ... counting characters indeed | Oct 1, 2009 |
Gergely Vandor wrote: This leads to another question: isn't it way easier to measure effort by counting characters instead? Gergely Precisely, counting characters is obviously an easier and more equitable way of measuring effort (for one, words can have anything from 1 to 'n' characters and as such are not comparable units); a variation of this is exactly what we predominantly use in Poland - specifically the work calculation unit is a page (a started page usually counts as a full page - this problem could of course easily be avoided by applying character count alone) defined as a specific number of characters depending on the nature of the work (certified or not, specialist or general, etc.). Cheers Eryk | | | Polish standard | Oct 2, 2009 |
EJZ wrote: Gergely Vandor wrote: This leads to another question: isn't it way easier to measure effort by counting characters instead? Precisely, counting characters is obviously an easier and more equitable way of measuring effort (for one, words can have anything from 1 to 'n' characters and as such are not comparable units); a variation of this is exactly what we predominantly use in Poland Unless Trados is used... As Trados counts chars without spaces, it's more difficult for project managers to have a reliable char count. Of course, you may have an approximate count if you add the number of words but the addition is a complex operation - specifically the work calculation unit is a page According to the Polish Standard, 1800 chars (including blank chars). (a started page usually counts as a full page It's obligatory only for sworn translations. For "normal" ones, other solutions may be used (e.g. fractions of page). - this problem could of course easily be avoided by applying character count alone) defined as a specific number of characters depending on the nature of the work (certified or not, For the sworn translations, the 1135 chars page is used (defined by the law). specialist or general, etc.). Here, some translations offices try to "cheat" on the prices and impose some "irregular" pages, e.g. 1500 with spaces or 1800 without spaces. So, in the last case (or similar), a common joke is to deliver (or negociate to deliver) the text without spaces The standard is clear enough and I don't think it's necessary to multiply page definitions. BTW, the rate per line (mainly 55 characters) used in some countries is based on a similar principle but the rounding is different (according to the Polish Standard, a standard typewritten page has 30 lines x 60 chars). Cheers GG
[Edited at 2009-10-02 08:44 GMT] | |
|
|
Honest quotations vs wordcounts... Trados... | Oct 2, 2009 |
Gergely Vandor wrote: Countng words is far less than obvious, every tool does it differently. I'm not sure I see how a Trados-like count could be "better". MemoQ in fact takes a simplistic approach, and basically defines a word as a string of characters separated by whitespace. This can be less or more than what Trados counts. This leads to another question: isn't it way easier to measure effort by counting characters instead?[/quote] Seconded. See my previous port. Also, I find it alarming how readily people accept these "CAT wordcounts" as "precise" basis for a quotation, or payment to a translator. The quality of the TMs and terminology (if there is any) is just as important. Seconded. No comments. Not to mention the quality of the source material: how many tags will I encounter? Is it over-formatted? I try to explain to my students they're should always take in account the formatting. IMHO the tags shouls be paid by default but it's difficult to force the TOs to accept it... Are there many typos in the source? Are there editing problems preventing sane segmentation? Is it easy to see where and how the segments will turn up in the final document? Seconded. But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us. Cheers GG | | | does Trados Studio count the same as Trados 2007 | Oct 3, 2009 |
Grzegorz Gryc wrote: But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us. Which version of Trados is the standard? Does Studio count the same as 2007? Let's not forget that this "de facto standard" is officially dead now, if we are talking about the "old" Trados. All those that pretend nothing has happened and go on with the old Trados without a plan for the future are making a big mistake in my opinion. Gergely | | | So called Trados wordcount... | Oct 3, 2009 |
Gergely Vandor wrote: Grzegorz Gryc wrote: But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us. Which version of Trados is the standard? Does Studio count the same as 2007? I dunno. Probably no I can check it for you using some fancy examples. It may take some time (normally, when I'm posting here, it's not because I have nothing to do, I just try to stand my migraines... the flame wars are easier than translation...). Let's not forget that this "de facto standard" is officially dead now, if we are talking about the "old" Trados. Of course. We're talking about the "old" Trados. In my neighborhood, no translation office uses Trados 2009. But the Trados 2007 series will remain long time alive in the "translation ecosystem". Sorry for quoting SDL marketing language All those that pretend nothing has happened and go on with the old Trados without a plan for the future are making a big mistake in my opinion. You're right but the reality bites. A "Trados like" wordcount (using apostrophes, hyphens, nbsp etc. as separators) is just convenient and it will be convenient at least during next 2-3 years. Then, we'll see. I agree, the Trados approach is obsolete but still you have a bug in the "Trados like" wordcount Cheers GG | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » MemoQ - Why should it count less words than Trados? Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |