Cross-Lingual Captioning of Mongolian and English Video Contents, Pt I

Author: Natso Baatarkhuu, Communications Coordinator of ACMS | The ACMS UB Office has been holding biweekly public lectures in English since 2004 with minimal intermissions. While this is our flagship outreach activity designed to promote the academic research on Mongolia, providing Mongolian interpretation accessibility has always been a resource-intensive challenge, much like the challenge of making the Mongolian academic contributions accessible internationally.

But thanks to current AI tools, the now-virtual Speaker Series (VSS) lectures and panels can be alternately held in English and Mongolian, with translated caption in the other language available shortly afterward on our YouTube channel.

The effort requires considerable amount of time and resources, and we’re continually experimenting to optimize the format for maximum accessibility for the speakers of both languages. However, dual-language accessibility is the key to inclusive and informed discourse in the field of Mongolian studies. Moreover, virtually any field that needs near real-time translation can utilize these techniques.

In the first part of this three-part series, I’m going to share my experience of subtitling our first Mongolian Virtual Speaker Series panel, primarily using Chimege App. In the second part, I’m going to share my reverse experience: creating Mongolian subtitles for an English language video, with the help of model. In the final part, I’m going to share the new experimental approach of using Raspberry Pi and the Media Translation AI to provide live subtitles.

Part I. Creating English subtitles for a Mongolian video content


Step 1. Prepare the speakers to enunciate and be concise, and ensure good audio quality.

“The History of the Mongols’ Sedentary Cultures” panel of December 18, 2020, was the first virtual panel to be held in Mongolian. We informed our speakers beforehand that their presentations would be transcribed to create English subtitles, and asked them to refrain from trailing off or making elliptical statements in their speech as much as possible. Fortunately, they were all well-spoken and concise, and most of the flubs were observed in my own speech while moderating the event. One more thing to improve on in the next panels was to check the microphones beforehand and ensure the audio is clean, as that reflected in the transcription quality.

Step 2. Edit the event recording and create an audio-only file.

This could be done on any video-editing software, or skipped if the recording is ready to be transcribed. The important part is to create a separate audio file out of your final video version, ideally in m4a, mp3, or wav format. This can be done on the same video editing file (Adobe Premiere Pro in my case), but if you’re not editing your video, you can use VLC Media Player to create them.

Step 3. Transcribe the audio file with Chimege Writer.

Before using Chimege Writer, I used Google Cloud Platform’s Speech-to-Text AI, it’s currently free for under 60 minutes, and has $200 credit for new users, but its accuracy was significantly lower than Chimege’s proprietary model. Below is the comparison of a sample paragraph transcribed by GCP (top) and Chimege (bottom).

Incidentally, Chimege announced recently that they have achieved 97% accuracy in transcribing Mongolian, and it shows.

Chimege Writer’s sign-up process is outlined below:

  1. Download the app from Google Play or App Store.
  2. Create an account and verify.
  3. Go to
  4. Click on “Get Plan”. They currently charge 19,800₮ (~$7) for 1 hour, but their monthly, quarterly, and annual plans are slightly discounted.
  5. Charge your Bolor Wallet by paying the company. (Wiring the money to their Khan Bank account.)

After it’s activated, the web app lets you upload your audio file and transcribe it as seen below. Note that the transcription process takes some time depending on the length of the audio, and doesn’t become ready immediately.

Step 4. Run the in-house spellcheck and download the transcription.

As Chimege was developed by Bolorsoft, there is an option to run its spellcheck. Click on “Download” and choose “Word” to download the transcription as a Microsoft Word document.

Step 5. Upload the event video on YouTube and set the language to Mongolian.

This is straightforward, but it’s important to make sure the transcribed audio file is identical to the uploaded video file’s audio track. Click on “Show More”. Under “Language, subtitles, and closed captions (CC)”, there is a dropdown menu of Video language. Choose Mongolian.

Step 6. Open the Subtitles in YouTube and open a Mongolian caption track.

Click on the “To manage other languages, go to subtitles.” under the Video language (see above). You can also go to YouTube Studio, then to Subtitles in the left sidebar, and then choose your video, which should take you to the Subtitle configuration window. Click on “Edit” under the row Subtitles and it should launch the Subtitle Editor, which should look like this.


Step 7. Clean the transcription and paste the subtitles sentence-by-sentence.

This is the most time-consuming part of the process: copying and pasting the transcription onto the YouTube Subtitle Editor, one sentence at a time.

Since these Mongolian subtitles will later be translated into English via Google Translate, the sentences should be simple and compound, rather than complex. While Chimege Writer’s AI model is accurate, it still lacks capitalization of proper nouns and their automatic punctuation is not always perfect. Industry-specific terms and scientific jargons are also likely to have been mis-transcribed.  Therefore, the sentences need to be slightly edited to meet these requirements.

Recommended (keyboard) shortcuts include:

  1. “Ctrl+Shift+Arrow” to quickly highlight the sentence from the word processor file.
  2. Letting the video play until the end of a sentence and then clicking on “Add new caption” to adjust the caption length to the audio.
  3. “Ctrl+C” and “Ctrl+V”, naturally.

Step 8. Publish the subtitles, and download the .srt file of the Mongolian translation.

Click on “Publish”. Congratulations! You have successfully made Mongolian subtitles for your video. Now download the caption from the YouTube platform in .srt format. This should give you a timestamped text file.

Step 9. Upload and edit the Mongolian subtitle file in the Online Subtitle Translator and Editor.

Go to and upload your .srt file into the “Drop file” field.

This is a web app developed by Syed G. Akbar, and is preferable over passing the file through Google Translate or using YouTube’s “Auto-Translate” function in several ways.

Firstly, it handles large amounts of text without messing up the subtitle timing. Secondly, it lets you edit the subtitles in a tabular format, with the translated text in the row next to the original text, which makes it much easier to translate. Thirdly, clicking on the pencil icon lets you make global edits, meaning any change you made to a certain word or phrase gets reflected in the whole file.


Step 10. Download the .srt file and upload back to YouTube.

Click on “Save as” to download the English subtitles. You can upload the English subtitles by going back to YouTube Studio. Make sure to choose English (United States) to match the default spelling of Google Translate. That’s it!


The Mongolian subtitles in the VSS would not have been possible without the vision of our new Executive Director, Dr. Bolortsetseg Minjin, to address the language barrier issue in reaching wider audience in Mongolia. “The History of the Mongols’ Sedentary Cultures” would not have been possible without our four panelists Dr. Iderkhangai T, Dr. Erdenebat U, Ochbayar G, and Aldarsaikhan G, as well as the help of our colleagues Gantungalag T, Baigalmaa B, and Tuvshinzaya T. The Virtual Speaker Series is supported by the ACMS members, the US Department of State, and the U.S. Department of Education.

1 thought on “Cross-Lingual Captioning of Mongolian and English Video Contents, Pt I”

Leave a Comment

Your email address will not be published.

Scroll to Top
Skip to content