Importing multiple .csv's into one term base - help needed
Thread poster: Janneke Hopman
Janneke Hopman
Janneke Hopman
Netherlands
Local time: 20:08
English to Dutch
Sep 19, 2012

I'm trying to import several different term base exports (from MemoQ) with different language pairs into a new, combined multilingual MemoQ term base. The exports I use all have just one language pair, and I want to merge all of these into one multilingual term base.

What happens when I try it, is that memoQ makes a new entry for every term of every export, even when the English 'source' term is already in the term base. I want MemoQ to add terms that have the same English source te
... See more
I'm trying to import several different term base exports (from MemoQ) with different language pairs into a new, combined multilingual MemoQ term base. The exports I use all have just one language pair, and I want to merge all of these into one multilingual term base.

What happens when I try it, is that memoQ makes a new entry for every term of every export, even when the English 'source' term is already in the term base. I want MemoQ to add terms that have the same English source term to the same entry (they al have English as one of the two languages), but it won't. It just makes a new entry with the exact same English term, producing lots of duplicates.

Is there any way to avoid this, and just merge the different glossaries?

I'm using MemoQ 5.0.65 with a project manager license.

Many thanks in advance!
Collapse


 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:08
Member (2005)
English to Spanish
+ ...
Indeed it does not Sep 19, 2012

Indeed, memoQ cannot filter out duplicated entries when you import several CSV files into a termbase. Or at least, I do not know how to do it.

Hence, the solution must be outside of memoQ: if all CSV files have the same structure (same columns and same separator caracter, for instance a tab), you can easily merge all of them either by opening them in Excel and manually copying them to create a common Excel worksheet, and then use Excel's filter function to exclude duplicates. Then y
... See more
Indeed, memoQ cannot filter out duplicated entries when you import several CSV files into a termbase. Or at least, I do not know how to do it.

Hence, the solution must be outside of memoQ: if all CSV files have the same structure (same columns and same separator caracter, for instance a tab), you can easily merge all of them either by opening them in Excel and manually copying them to create a common Excel worksheet, and then use Excel's filter function to exclude duplicates. Then you save the result of filtering to a new Excel file which you can save as a single CSV file to be imported into memoQ.

Edited to add this: Since each CSV file contains a different target language, make sure that, when you paste the contents of each CSV file to the common Excel file, you place each language in a different column, so that you can later have a column per language at the import stage in memoQ.

I hope this makes sense! If you need detailed steps, just shout and I will do my best!

[Edited at 2012-09-20 09:43 GMT]
Collapse


 
Janneke Hopman
Janneke Hopman
Netherlands
Local time: 20:08
English to Dutch
TOPIC STARTER
Manual copying is a challenge as well Sep 20, 2012

Thanks Tomás, manual copying it is then. But where I'm stuck is the fact that not all CSV's have the exact same English source terms (and not the exact same structure for that matter). Some languages have more than others, and some have almost no overlap with any other language at all. So instead of deleting duplicates, I need the different translations of English source terms merged within one entry (one row in Excel) and I haven't thought of a way to automate this, or make it quicker than ma... See more
Thanks Tomás, manual copying it is then. But where I'm stuck is the fact that not all CSV's have the exact same English source terms (and not the exact same structure for that matter). Some languages have more than others, and some have almost no overlap with any other language at all. So instead of deleting duplicates, I need the different translations of English source terms merged within one entry (one row in Excel) and I haven't thought of a way to automate this, or make it quicker than manually cpoying and aligning all languages in Excel. Maybe that's because of my inexperience with Excel though.

Can anyone else think of a way?
Collapse


 
Szabolcs Király
Szabolcs Király  Identity Verified
Hungary
Local time: 20:08
Portuguese to Hungarian
+ ...
Might worth a try Sep 20, 2012

The problem is that MemoQ uses the ID column to create the database and identify the terms - none of the files have it, so MemoQ will create it; that is, it will create new IDs, regardless if the term is the same.

If you have both files in the very same alphabetical order:
1) Import the first file
2) Open the second file in Excel and create a Primary key/ID column and fill it with the correspondig numbers.
3) Import the second file into MemoQ and choose the option
... See more
The problem is that MemoQ uses the ID column to create the database and identify the terms - none of the files have it, so MemoQ will create it; that is, it will create new IDs, regardless if the term is the same.

If you have both files in the very same alphabetical order:
1) Import the first file
2) Open the second file in Excel and create a Primary key/ID column and fill it with the correspondig numbers.
3) Import the second file into MemoQ and choose the option the update columns with the same ID

Hope that helps.
Collapse


 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:08
Member (2005)
English to Spanish
+ ...
A doubt Sep 20, 2012

Szabolcs Király wrote:
2) Open the second file in Excel and create a Primary key/ID column and fill it with the correspondig numbers.

I am not sure this will work. How will you ensure that each source language term receives the same ID across all CSV files?


 
Szabolcs Király
Szabolcs Király  Identity Verified
Hungary
Local time: 20:08
Portuguese to Hungarian
+ ...
alphabetical order Sep 20, 2012

That's the important thing.
The 2 files should look like eg.:

EN1 - ES1
EN2 - ES2
...

ID1 - EN1 - IT1
ID2 - EN2 - IT2

MemoQ will assign the ID's in corresponding order: ID1: EN1 - ES1 and so on.
So if you create the ID column in Excel for the second file you can easily add another language.


 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:08
Member (2005)
English to Spanish
+ ...
Still not sure... Sep 20, 2012

Szabolcs Király wrote:
MemoQ will assign the ID's in corresponding order: ID1: EN1 - ES1 and so on.
So if you create the ID column in Excel for the second file you can easily add another language.

This would only be true and work if all CSV files contained the exact same source terms time and time again.

I suspect each CSV file contains a different set of terms and that what the OP wants is to make sure that those terms that are repeated cronologically in the set of CSV files get updated to the latest target term.

Maybe the OP can shed some light about whether all CSV files contain the exact same set of source terms.


 
Szabolcs Király
Szabolcs Király  Identity Verified
Hungary
Local time: 20:08
Portuguese to Hungarian
+ ...
correct Sep 20, 2012

That's true. I thought that this is issue.

 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:08
Member (2005)
English to Spanish
+ ...
For disparate files Sep 20, 2012

I understand that the files are all different from each other.

Hm... The situation looks a bit tricky indeed. There is this lookup function in Excel which could be useful, I guess. You could create a list of source terms only, sort them and remove duplicates, and then have one sheet of the Excel file for each of the languages. With the lookup function, you can look in the bilingual tables to complete the language's column by looking up possible translations from the bilingual sheets
... See more
I understand that the files are all different from each other.

Hm... The situation looks a bit tricky indeed. There is this lookup function in Excel which could be useful, I guess. You could create a list of source terms only, sort them and remove duplicates, and then have one sheet of the Excel file for each of the languages. With the lookup function, you can look in the bilingual tables to complete the language's column by looking up possible translations from the bilingual sheets.

If you wish I can try to arrange it for you, since it takes longer to explain it than to do it! Just free free to email me about it and I will try to prepare an Excel file you can easily export to a single multilingual CSV file.

[Edited at 2012-09-20 11:41 GMT]
Collapse


 
Janneke Hopman
Janneke Hopman
Netherlands
Local time: 20:08
English to Dutch
TOPIC STARTER
Not the same source terms Sep 20, 2012

Thanks Szabolcs and Tomás, for your thoughts.

The source terms are not the exact same set at all (see my previous post), that is indeed what's causing my problem. I'm sorry if I didn't make that clear before. There are some source terms that occur in multiple or all CSV files, but lots of them that don't, that are unique to their language pair. The CSV files have no real common denominator (like corresponding ID's or a list of source terms that can be ordered alphabetically) to ali
... See more
Thanks Szabolcs and Tomás, for your thoughts.

The source terms are not the exact same set at all (see my previous post), that is indeed what's causing my problem. I'm sorry if I didn't make that clear before. There are some source terms that occur in multiple or all CSV files, but lots of them that don't, that are unique to their language pair. The CSV files have no real common denominator (like corresponding ID's or a list of source terms that can be ordered alphabetically) to align them against, or at least that's what I think.

I'll try again to explain what I want to do, because your suggestion, Szabolcs,
"what the OP wants is to make sure that those terms that are repeated cronologically in the set of CSV files get updated to the latest target term." doesn't sound quite right.

The people whom I got these CSV files from have all created terms as they were translating, and have made individual decisions as to which terms to include in the term base. So no two sets of source terms are the same, let alone across all of the languages. What I want is one glossary that contains all these translations. This means that not all English source terms will have a translation in every language, but some will have a translation in several languages and this should remain one entry, instead of separate ones like MemoQ does now.

ETA: This message may be irrelevant by the time it gets approved, it has been in limbo for a while and I wasn't aware my previous message also hadn't been approved yet, so it may seem a bit strange when it finally appears.


[Edited at 2012-09-20 14:57 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 19:08
Member (2009)
Dutch to English
+ ...
Kilgray needs to improve data management in memoQ Sep 20, 2012

What we really need is for Kilgray to do some work on their TB and TM editors. memoQ is great, and I think it is the best CAT tool/TEnT on the market at the moment, but they need to stop adding new features for a few months and work on data management. There is a growing list of things that users have been asking for for a long time that never get added/fixed.

Michael


 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:08
Member (2005)
English to Spanish
+ ...
Entirely agree! Sep 20, 2012

Michael Beijer wrote:
What we really need is for Kilgray to do some work on their TB and TM editors. memoQ is great, and I think it is the best CAT tool/TEnT on the market at the moment, but they need to stop adding new features for a few months and work on data management. There is a growing list of things that users have been asking for for a long time that never get added/fixed.

I could not agree more! As much as I like the new features... I still find it impossible to do quick, reliable edits in our sometimes rather large TMs and TBs...


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Importing multiple .csv's into one term base - help needed






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »