Developping for the translation industry RSS 2.0



 Wednesday, 10 February 2010

PhotographerI like to use to use images to help illustrate the theme or point of a blog post. It’s a proven “best practice” in blogging and I highly recommend that every blogger do it.

One trick for easily finding and properly using images in your blog posts is to search the creative commons licensed photos on the photo sharing site Flickr.

So, what’s Creative Commons?

Creative Commons is a non-profit organization that has created a standardized set of tools for granting various levels of permission for people to use creative works freely. The author or in this case photographer of the works designates a type of license and then Flickr allows you to sort through and find only photos that are free to be used for blog posts. I choose to use photos that carry the attribution/share alike license. This means that I may use the image here as long as I attribute the image to the Flickr user’s account where I found it. Here’s Flickr’s description of CC licenses.

So, here’s how to find and grab great images.

  1. Surf to the Flickr Creative Commons Search Page – all images you search for here are free to use with proper attribution
  2. Search for a specific phrase or concept and choose the image that fits
  3. Click on “all sizes” and choose the size you wish to post on your blog
  4. Right click the image and choose “copy image location” – use this path to paste into your blog post where you want the image to appear
  5. Somewhere in your post add the words – Image credit and the link to the Flickr account where you found the image (see at the bottom of the post)

To be a good photo user make sure you add your own images and make the available through the proper CC license – you can make this a default Flickr account setting.

Image credit: dashitnow

Other Posts:

The Best Damn Web Marketing Checklist, Period!

What Are Customers Saying About You Online?

8 easy tips to drive traffic from search engines to your site

Wednesday, 10 February 2010 09:35:46 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
General
 Tuesday, 09 February 2010

This is a very funny video that I found a while ago… Tought I should share it with you all!

 

Other posts:

List of Crazy Laws in the United States

When CAPTCHA goes bad

Chuck Norris Programming facts and More Programming Chuck Norris facts

Remember Windows ME?

SQL Injection humor 

Tuesday, 09 February 2010 15:22:16 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Humor

Here is a quick and easy way to remove multiple whitespaces from a string, leaving only one space character between tokens.

CREATE FUNCTION dbo.CleanString
               (@string VARCHAR
(50))
RETURNS VARCHAR
(50)
AS
  BEGIN
    SET @string = Ltrim
(Rtrim(@string))
    
    WHILE Charindex
('  ',@string) > 1
      SET @string = Replace
(@string,'  ',' ')
    
    RETURN @string
  END

 

Other Posts:

How to generate random numbers with a T-SQL query

How to insert a file in an image column in SQL Server 2005

How to track the growth of your SQL Server database

SQL Server indexing best practices and guidelines

How to remove leading zeros from the results of an SQL Query

Tuesday, 09 February 2010 13:24:17 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Code Snippet | SQL

Google is developing software for the first phone capable of translating foreign languages almost instantly — like the Babel Fish in The Hitchhiker’s Guide to the Galaxy.

By building on existing technologies in voice recognition and automatic translation, Google hopes to have a basic system ready within a couple of years. If it works, it could eventually transform communication among speakers of the world’s 6,000-plus languages.

The company has already created an automatic system for translating text on computers, which is being honed by scanning millions of multi-lingual websites and documents. So far it covers 52 languages, adding Haitian Creole last week.

Google also has a voice recognition system that enables phone users to conduct web searches by speaking commands into their phones rather than typing them in.

“We think speech-to-speech translation should be possible and work reasonably well in a few years’ time,” said Franz Och, Google’s head of translation services.

“Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation and high-accuracy voice recognition, and that’s what we’re working on.

“If you look at the progress in machine translation and corresponding advances in voice recognition, there has been huge progress recently.”

Although automatic text translators are now reasonably effective, voice recognition has proved more challenging.

“Everyone has a different voice, accent and pitch,” said Och. “But recognition should be effective with mobile phones because by nature they are personal to you. The phone should get a feel for your voice from past voice search queries, for example.”

The translation software is likely to become more accurate the more it is used. And while some translation systems use crude rules based on the grammar of languages, Google is exploiting its vast database of websites and translated documents to improve the accuracy of its system.

“The more data we input, the better the quality,” said Och. There is no shortage of help. “There are a lot of language enthusiasts out there,” he said.

However, some experts believe the hurdles to live translation remain high. David Crystal, honorary professor of linguistics at Bangor University, said: “The problem with speech recognition is the variability in accents. No system at the moment can handle that properly.

“Maybe Google will be able to get there faster than everyone else, but I think it’s unlikely we’ll have a speech device in the next few years that could handle high-speed Glaswegian slang.

“The future, though, looks very interesting. If you have a Babel Fish, the need to learn foreign languages is removed.”

In the Hitchhiker’s Guide to the Galaxy, the small, yellow Babel Fish was capable of translating any language when placed in the ear. It sparked a bloody war because everyone became able to understand what other people were saying.

Source: Times Online

 

Other Posts:

Google Willing To Pay 500$ Bounty For Each Chrome Browser Bugs You Find

Silverlight Game Creation Tutorials

Facts and Figures about the Language Industry

Google Translator Hacked

Compendium of Dumb Laws in the United States

Tuesday, 09 February 2010 09:39:00 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Language Industry | News
 Friday, 05 February 2010

You can place a robots.txt file in the root of your site to help inform search engines and other bots about the areas of your site that you don’t want them to access. For example, you may not want bots to access the content of your images folder: 

User-agent: *
Disallow: /images/
 

You can also provide instructions for particular bots. For example, to exclude Google image search from your entire site, use this: 

User-agent: Googlebot-Image
Disallow: /
 

The robots.txt standard is unfortunately very limited; it only supports the User-agent and Disallow fields, and the only wildcard allowed is when you specify it by itself in User-agent, as in the previous example.

Google has introduced support for a couple of extensions to the robots.txt standard. First, you can use limited patterns in pathnames. You can also specify an Allow clause. Since those extensions are specific to Google, you should probably only use them with one of the Google user agents or with Googlebot, which all of its bots recognize.

For example, you can block PNG files from all Google user agents as follows: 

User-agent: Googlebot
Disallow: /*.png$
 

As with regular expressions, the asterisk means to match any sequence of characters, and the dollar sign means to match the end of the string. Those are the only two pattern matching characters that Google supports.

To disable all bots except for Google, use this: 

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

To exclude pages with sort as the first element of a query string that can be followed by any other text, use this:

User-agent: Googlebot
Disallow: /*?sort
 

This clause will also work only woth the Google bots.

 

Other posts:

White House new Robots.txt

8 easy tips to drive traffic from search engines to your site

Huge List of Dumb and Crazy Laws in the United States

Tools for Web developers

Friday, 05 February 2010 14:26:11 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
General

In January, Google went public with news that some of its systems had been hacked, along with those of a number of US-based companies. The attacks had targeted both accounts maintained by political activists and commercial code, and Google pointed the finger straight at China, vowing to change its entire approach to business in that country. But a report now suggests that the company is also looking to beef up its internal defenses to prevent a repeat of the attacks.

The Washington Post is reporting that Google has started negotiations with the US National Security Agency about a collaborative effort to analyze the attack and figure out how best to prevent a recurrence. The Post is citing confidential sources, as the deal isn't final and, even if it were, it's unlikely that Google would seek to publicize it.

For starters, both organizations have already been the target of many complaints by privacy advocates, the NSA for its domestic surveillance efforts, Google for its data retention policies. The combination of the two would clearly make the advocates far more uneasy, and might help them make their case with the wider public. Meanwhile, as the report notes, private companies have often been loath to share information about their proprietary systems with the government for a variety of reasons.

That may explain why the negotiations have been going slowly, as the NSA would clearly need access to and understanding of Google's infrastructure in order to fully evaluate the attacks and future risks. And that's precisely the sort of proprietary information that Google is presumably reluctant to provide anyone with—even a highly secretive organization like the NSA.

Other posts:

Google Willing To Pay 500$ Bounty For Each Chrome Browser Bugs You Find

Google Translator Hacked

BusinessWeek hit by SQL Injection attack

Password aren't a good defense?

Friday, 05 February 2010 10:30:04 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
News | Security
 Wednesday, 03 February 2010

The sys.objects system table offers you some information on the last modifications made on any database object. As a quick example, the following script queries for the list of user tables modified since the start of this year.

DECLARE  @date  AS DATETIME
SET @date = '2010-01-01'

SELECT   name, 
         type_desc,
         create_date,
         modify_date
FROM     sys.objects
WHERE    TYPE = 'U'
         AND modify_date >= @date
ORDER BY modify_date

Now this knowledge can help you manage your SQL Servers more easily. You could, for example, create a script that runs every night and send you within an email the list of objects modified the last day.

The list of available types to filter your search are:

AF Aggregate function (CLR)
C CHECK constraint
D DEFAULT (constraint or stand-alone)
F FOREIGN KEY constraint
FN SQL scalar function
FS Assembly (CLR) scalar-function
FT Assembly (CLR) table-valued function
IF SQL inline table-valued function
IT Internal table
P SQL Stored Procedure
PC Assembly (CLR) stored-procedure
PG Plan guide
PK PRIMARY KEY constraint
R Rule (old-style, stand-alone)
RF Replication-filter-procedure
S System base table
SN Synonym
SQ Service queue
TA Assembly (CLR) DML trigger
TF SQL table-valued-function
TR SQL DML trigger
TT Table type
U Table (user-defined)
UQ UNIQUE constraint
V View
X Extended stored procedure

 

Other posts:

How to: Find The List Of Unused Tables Since The Last SQL Server Restart

Differences between temporary tables and table variables

Using derived tables to boost SQL performance

How to track the growth of your database

Wednesday, 03 February 2010 11:00:54 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Code Snippet | SQL
 Monday, 01 February 2010

This script will return a list of user tables in the database, along with the last time a SELECT was executed against them (including any views that include data from that table). This can be used to determine if a table is not in use anymore.

Statistics are reset when the SQL Server sevice is restarted, so this query will only return activity since that time. Also, just because there's no activity doesn't mean it's safe to remove an object - some tables may only be used during monthly or annual processes, and so they wouldn't show any activity except during those brief intervals.

WITH lastactivity(objectid,lastaction)
     AS (SELECT object_id      AS tablename,
                last_user_seek AS lastaction
         FROM   sys.dm_db_index_usage_stats u
         WHERE  database_id = Db_id
(Db_name())
         UNION
         SELECT object_id      AS tablename,
                last_user_scan AS lastaction
         FROM   sys.dm_db_index_usage_stats u
         WHERE  database_id = Db_id
(Db_name())
         UNION
         SELECT object_id        AS tablename,
                last_user_lookup AS lastaction
         FROM   sys.dm_db_index_usage_stats u
         WHERE  database_id = Db_id
(Db_name()))
SELECT   Object_name
(so.object_id) AS tablename,
         Max
(la.lastaction)        AS lastselect
FROM     sys.objects so
         LEFT JOIN lastactivity la
           ON so.object_id = la.objectid
WHERE    so.TYPE = 'U'
         AND so.object_id > 100
GROUP BY Object_name
(so.object_id)
ORDER BY Object_name
(so.object_id)

 

Other posts:

How to generate random numbers within a T-SQL query

SQL Server SPACE function

SQL Injection humor

Differences between temporary tables and table variables

How to track the growth of your database

How to get the total number of rows in a database

Using derived tables to boost SQL performance

Monday, 01 February 2010 10:36:17 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Code Snippet | SQL

This is a pretty interesting development from Google and also seems to be coming much more common now, companies openly offering payments for bugs/vulnerabilities discovered in their software. They already used that strategy to find bugs last year with their Native Client Security hacking contest. This time they offer $500 for most vulnerabilities, $1,337 for 'particularly clever' flaws. You can see the blog post on the Chromium blog here.

It’s a chance for the white-hat guys to earn a few bucks, but honestly I don’t think it’s going to change anything. Especially not when we’re talking $500 per vulnerability because a serious browser 0-day exploit that can allow execution of malware will go for 100 times that much on the black market. Even for the particularly severe or clever bugs worth $1,337, that’s still peanuts compared to what they can sell the exploit for on the black market.

I hope it helps though and gives some legitimate security researches a little more incentive to focus on Chrome, the bad guys won’t pay much attention though as Chrome is still a relatively small player in the browser world.

From the article at Network World

“We are hoping that … this program will encourage new individuals to participate in Chromium security,” said Evans. “The more people involved in scrutinizing Chromium’s code and behavior, the more secure our millions of users will be.”

“Internet Explorer, Safari, Firefox…those browsers have been out there for a long time,” said Pedram Amini, manager of the security research team at 3com’s Austin, Tex.-based TippingPoint, which operates Zero Day Initiative (ZDI), one of the two best-known bug-bounty programs. “But Chrome, and now Chrome OS, need researchers. Google needs people to put eyes on the target.”

Google’s new bounty program isn’t the first from a software vendor looking for help rooting out vulnerabilities in its own code, but it’s the largest company to step forward, Amini said. Microsoft , for example, has traditionally dismissed any calls that it pay for vulnerabilities. “This will be beneficial to Google,” Amini added. “There are actually very few vendors who play in the bounty market, but Google doing it is definitely interesting.”

I don’t realistically expect any groundbreaking bugs to come out of this initiative, but I think a few people might bust out their browser fuzzing tools and see what they can find.

Worth a bit of effort if you can find 10 decent bugs in a couple of hours and net yourself $5000usd.

You can see the on the chromium project severity guidelines page the different severity ranking for bugs.

 

Other posts:

Google Translator Hacked

Some tips to enhance your SQL Server security

What is LDAP injection?

Some tips to enhance your SQL Server security

How to generate random numbers within a T-SQL query

Monday, 01 February 2010 10:22:17 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
News | Security
 Friday, 29 January 2010

With all of the news about hacked e-mail accounts, it isn’t a big surprise that other Google services can be manipulated, too. Yesterday, politicking or pranking Russian translators forced a Google Translate mistranslation of four segments — “USA is to blame,” “Russia is to blame,” “Obama is to blame,” and “Medvedev is to blame” into English from Russian (click here to see a screenshot).

Google Переводчик (Translator) made the U.S. and President Obama blameless in the Russian translation (”USA is not to blame,” and “Obama is not to blame”, while placing blame on Russia and President Medvedev. Naturally, soon after the news went up, Google quickly fixed the translations.

According to Moscow News:

The same is true if the word combination is translated into Ukrainian and Belorussian. However, if the output translation is set to Spanish, French, German, and other European languages, it is translated correctly. […]

"These are translation bombs" said Alla Zabrovskaya, Google's Russian Public Relations Director.  "We are not always able to weed them out, and it is good that our users find them, and let us know about them.

But the question that remains is: how many more of those mistranslations (or “translation bombs” as they call them) exists in the Google translation engine (or any others automatic translation engine)? Some of them should be very easy to spot (such as translation “White House” with “ Visit myspamsite.com”) but others will be spotted only through careful analysis of the translation.

The lessons learned here:

  • Crowdsourcing applications need protection against malicious manipulation because the wisdom of the crowd will more and more reflect the politics of its members.
  • Online translation applications are only as reliable as the crowd that feeds them. You should therefore never use those applications to translate important documents or messages. “Machine translation” is only useful to grasp the general meaning of a piece of content but nothing more.
  • If you need real professional translations, you should work with a real translation provider. Unlike automatic translation engines, they have the ability to garantee you that your message will be the same in every language.

 

Other posts:

In the news: Bing translator now supports Haitian Creole

Facts and Figures about the Language Industry

Some tips to enhance your SQL Server security

Big news in the translation industry

Friday, 29 January 2010 10:47:40 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Language Industry | News
 Tuesday, 26 January 2010

Bing Translator now comes with support for Haitian Creole, also referred to as Creole or Kreyòl, one of the two official languages in Haiti, along with French. Vikram Dendi, senior product manager, Microsoft Translator, notes that Microsoft has worked in order to introduce support for Haitian Creole at the request of the community involved in Haitian relief efforts. In this regard, Microsoft Research unveiled at the end of the past week an experimental machine translation system designed to allow users to translate to and from Haitian Creole.

“This is an experimental system put together in record time. While our typical approach to adding new languages involves significantly larger amounts of training, a higher threshold for quality testing – we decided that the upside warranted making the system available to the community at the earliest, and continue improving it subsequently. We are working diligently to keep improving the quality, but bear with us if you encounter problems. You can always contact us at mtcont at microsoft.com with feedback,” Dendi stated.

Machine translation associated with Haitian Creole is available not only in Bing Translator, but also via additional Microsoft Translator technologies, including services and application programming interfaces. An illustrative example in this regard, is the Messenger Translation Bot which can now speak Haitian Creole. All that users need do is add mtbot@hotmail.com to their messenger buddy list in Windows Live messenger and they will be able to talk with Kreyol speakers.

“The Haitian Creole translator is now part of the Microsoft Translator web service enabling many of the user scenarios powered by the service. Users can access the service through the Microsoft Translator web site. Developers would be interested in looking at our APIs – and choose from SOAP or HTTP (Support for Haitian in our AJAX API will be rolled out in the coming days),” Dendi added.

The Microsoft Translator API, the machine translation technology and services from Microsoft, including Bing Translator can be accessed and used completely free of charge. Developers can leverage the application programming interface in order to build apps or integrate translation services into websites with support for Haitian Creole.

“In the coming days expect to see support for Haitian Creole added to even more of our scenarios (Translator widget, Office etc) as well as the AJAX API. Known issues and announcements can also be found on our forums. We hope that this contribution proves useful to the various humanitarian efforts underway, and please stay tuned to this blog for further news on the Haitian Creole language support,” Dendi explained.

Source: Softpedia

Tuesday, 26 January 2010 10:06:45 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
Language Industry | News
 Friday, 22 January 2010

Seattle-based Amazon.com said late Wednesday evening that it has launched a new, software development kit for its Kindle electronic reader.

According to Amazon, its new Kindle Development Kit allows developers to create active content for the Kindle, including mobile games, active content, links to web site data, and more. Among firms developing content for the Kindle are EA Mobile, the mobile games arm of Electronic Arts; Handmark, which is developing an active Zagat guide which pulls Zagat's online ratings and reviews onto the Kindle; and Sonic Boom, which is building word games and puzzles for the electronic reader.

Amazon said it will be providing a limited beta to participants who want to access the development kit, which includes sample code, documentation, and a Kindle Simulator which runs on Mac, PC, and Linux desktops.

 

Other posts:

Huge List of Dumb and Crazy Laws in the United States

Some tips to enhance your SQL Server security

Facts and Figures about the Language Industry

Non-Latin internet addresses

How-To: hiring and managing geeks

Friday, 22 January 2010 11:04:16 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
News

Navigation
Advertisement
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2017
Stanislas Biron
Sign In
Statistics
Total Posts: 135
This Year: 0
This Month: 0
This Week: 0
Comments: 1
All Content © 2017, Stanislas Biron