What Nikkei learnt from building its own Japanese AI chatbot

The new tool, which Nikkei created by building its own model, is embedded in articles and suggests questions to spark conversations with readers

Japanese media company Nikkei has joined the stream of financial publications who have decided to build their own chatbot. However, unlike global publications, Nikkei had to build their own model to retrieve articles from their database as external large language models (LLMs) are not ready to process Japanese characters. Nikkei combines this work from its own model with summaries produced by an external LLM. 

The company’s chatbot is called Ask! NIKKEI and its user experience is similar to ChatGPT’s or Google Gemini’s. Users can ask the chatbot whatever they want to know and Nikkei will respond to them — as long as it has the answers in its vast database. Nikkei uses articles from after 2020, and typically within the last 18 months. The chatbot, only available to subscribers, is integrated within the articles to help users clarify complex financial topics and it even suggests questions for users to ask.

I spoke to Yosuke Suzuki, Engineering Manager at Nikkei, about this chatbot, content copyright, hallucinations, and language. Our conversation was edited for clarity and length. 

Q. Why did you and your team decide to build something like this?

A. A year and a half ago, when AI started to be a trend, we discussed this idea with outside consultants and got the suggestion to build the system that we currently have. Our system is quite good compared to others. Basically, we use only our content to provide good quality and summarisation to answer user queries. 

Q. How is it trained? 

A. In terms of training, we use a large language model, similar to OpenAI’s ChatGPT or Google Gemini. We switched between some large language models because sometimes they launch a new one and then the new one might be cheaper, faster and better. 

So when a user asks a question, we basically combine a number of articles into a prompt, and we ask it not to use knowledge from outside articles. Then the LLM generates an answer. We use data from our own articles to make our own model. We retrieve articles from our database, touching on all articles related to user queries. 

Q. Generative AI chatbots are known to ‘hallucinate’ or make up information when they do not know the answer to queries. Since your chatbot is trained on very specific data related to financial topics Nikkei has published about, what happens when you ask the chatbot questions it doesn’t know the answer to? 

A. In such cases, we don't provide answers. So if we cannot retrieve a related article in this process, we don't provide an answer. 

Q. There's been a lot of discussions of how these AI companies use, often without permission, content from news organizations to train their chatbots and LLMs. What does it mean for you to build your own chatbot using your own content?

A. This is very important for us because we think we should protect our copyright. We use only our content or log system.

We developed our own machine learning model to check whether this content is or isn’t owned by us. For example, our legal department decided that we don’t have a clear copyright on an interview article – one where our journalist interviews a famous person or the CEO of a company. So we eliminated interview articles from the RAG system of Ask! NIKKEI because our legal department decided that the copyright is half-and-half between us and the interviewee. 

Q. How challenging was it to integrate the Japanese language and its characters in the training of the model?

A.  It was very important to make our own Japanese model for our system. For English, we already have a lot of good quality models. But for Japanese, we don’t have many good models, so that’s why we built our own machine learning model for this system. As I mentioned, we use our own content, which is in Japanese, so we had to scrap and build several times before we launched this service. Our core engineer developed a new model three or four times. It was not easy.

Q. This chatbot is only available to subscribers of Nikkei. Are audiences engaging with the chatbot? Are they using it? 

A. Yes, but so far we only provide this system on web platforms and most users use the mobile app. In two or three months, we are going to launch a version of this system on our own mobile app. 

I don’t know if other publishers provide the kind of pre-made questions we suggest to readers. I think this is very important. Last year we interviewed a number of our users, and what we found is that it is very difficult for them to think of questions from articles. 

In terms of attracting new subscribers and retaining them, it is a very important tool. New subscribers that are not very familiar with financial terms can either ask a question or click on our pre-made questions. I have been working for this company for 24 years, but sometimes I don’t know technical financial terms, so I can quickly look up keywords I don’t know much about and the chatbot will give me a summary that will be very good quality. 

Q. What are the next steps for the chatbot? 

A. We plan to use the RAG technology we used for this chatbot in other new products. [RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models on the most accurate, up-to-date information and to give users insight into LLMs' generative process.]

Q. What's unique in Nikkei's value proposition?

A. Our latest news and columns can differentiate our products from other tech platforms. We cover business news and breaking business news. Wider coverage and faster news delivery are important aspects of our value proposition

Join our free newsletter on the future of journalism

In every email we send you'll find original reporting, evidence-based insights, online seminars and readings curated from 100s of sources - all in 5 minutes.

  • Twice a week
  • More than 20,000 people receive it
  • Unsubscribe any time

signup block

Join our free newsletter on the future of journalism

In every email we send you'll find original reporting, evidence-based insights, online seminars and readings curated from 100s of sources - all in 5 minutes.

  • Twice a week
  • More than 20,000 people receive it
  • Unsubscribe any time

signup block

Meet the authors

Gretel Kahn

What I do  I am a digital journalist with the Reuters Institute's editorial team, mainly focusing on reporting and writing pieces on the state of journalism today. Additionally, I help manage the Institute’s digital channels, including our daily... Read more about Gretel Kahn