In our last blog on BI trends to watch we touched on the exciting prospect of Natural Language Processing (NLP) becoming a tool for building queries in business intelligence. Advances in NLP are giving rise to potentially groundbreaking technology, and the subject deserves more than a cursory glance. In this blog we’ll explore some of the pitfalls, challenges and barriers that have kept NLP out of BI, as well as some of the exciting possibilities of its application as BI developers begin to integrate NLP into BI.
If search engines like Google have been using NLP technology for the past couple of years, why is BI only beginning to make use of this incredible tool? Clearly it’s not for lack of interest, demand, or potential applications. Bridging the chasm between human language and machine language is far more complicated than one might imagine. Computer scientists have been racking their brains over this technology since the middle of the last century and with every new innovation a new challenge was encountered. But with advances in deep neural networks, deep learning, and various machine learning models, NLP has improved enough to support commercial applications, including improvements in BI functionality and usability.
Common Language Analytics: One application of NLP already in practice in BI, as mentioned in the previous blog, is the translation of analytical results into common language to make the information more accessible to a broader audience. Narrative Science has bridged their Quill platform to work on Microsoft’s Power BI through their API, Narratives for Power BI. The Natural Language Generator (NLG) translates visual representations of analytical output into descriptive text. The generated text can be customized to meet the needs and preferences of the user. Adjustments can be made to verbosity—to make the text more simplified or illustrative—and the user can choose to view the output as a bulleted list or a continuous paragraph.
This may seem like a superfluous function to data nerds, like me, who have been trained to interpret charts and graphs, but it’s quite valuable. While not everyone has that background training, everyone in an organization brings an important perspective to the table. This feature allows access to information to individuals across a much broader spectrum of educational and training level, and learning style. It even has the potential to assist individuals facing disabilities—like visual impairments and visual processing deficits—to have a novel way to interact with information. With voice-to-text software, an individual with no eyesight can now have an audio representation of a pictorial analytical output. NLG provides access to data that can improve decision making to individuals at all levels, across an enterprise.
Common Language Queries: A second application briefly mentioned in the previous blog was the use of NLP as a means to translate common sentences into usable queries. In a piece for wired.com, Stephen F. DeAngelis stated,
“Most analysts appear to agree that the next big thing in IT is going to involve semantic search. It’s going to be a big thing because it will allow non-subject matter experts to obtain answers to their questions using only natural language to pose their queries. The magic will be contained in the analysis that goes into the search that leads to answers that are both relevant and insightful.”
We all do this regularly when we talk to the virtual assistants in our mobile devices. We don’t need a special language or structure when interacting with Alexa, Cortana, or Siri. We speak, they listen, and more often than not the program works correctly and delivers the requested information. The simplicity makes the feature available to any user the instant they interact with the device, without any special training.
That accessibility is precisely what makes natural language query builders so valuable. An individual no longer needs to know the right language to communicate with the analytics software. End users will have ever more access to information, and can offer novel insights on data analysis.
Wizdee is one company offering Natural Language Understanding (NLU) to translate text or voice prompts and questions into machine language. The company asserts that with their software “business intelligence becomes a part of every team member’s skillset.”
Unstructured Data: Another exciting use for NLP in BI is to make use of the vast amounts of unstructured data that have been made available by the ever increasing rise in social media use, on-line reviews and news, and IoT enabled devices. Allisa Lorentz, VP of Creative, Marketing and Design at Augify, gave the following explanation.
“We’ve become extremely proficient at collecting data – be it from enterprise systems, documents, social interactions, or e-mail and collaboration services. The expanding smorgasbord of data collection points are turning increasingly portable and personal, including mobile phones and wearable sensors, resulting in a data mining gold rush that will soon have companies and organizations accruing Yottabytes (10^24) of data.”
Until recently we had no way of searching, sorting and analyzing all this text data for content and correlation. Lorentz offers that the information locked in unstructured data provides context to our structured data vastly increasing its value. NLP is opening up this wealth of information for analysis by the powerful BI tools already available.
In December 2016, IBM announced its Watson Discovery Service, which allows users to find, standardize, and analyze unstructured data. IBM’s Luke Palamara described the importance of this innovation as follows.
“The analysis of structured content – numbers, dates, organized groupings of words, which tell us WHAT is happening – has been largely conquered with traditional analytics systems; however, the analysis of unstructured content presents continuing challenges. But it’s precisely unstructured content, like product reviews, social media and images, that tells us WHY things are happening.”
And that analysis of unstructured data—finding the ‘why’—is precisely what Watson Discovery Service, and others who follow, can provide to BI users.
Keep Moving Forward with Aptude
Aptude is your own personal IT professional services firm. We provide our clients with first class resources in a continuous, cost-containment fashion.
Our support services will free up your senior IT staff from the overwhelming burden of day-to-day maintenance issues. Now they’ll have time to launch those new projects and applications you’ve been waiting for. Simply put, we can free up your resources and contain your costs. Let’s have a quick chat to discuss our exclusive services.
Pitfalls, Challenges, and Barriers
Clearly the ‘why’ is important information, so why then, has it taken so long to deliver products that can analyze unstructured data? At this point, most of the challenges surrounding functional NLP are rooted in the complexity of human language. Think about how much more difficult it is for you to interpret the meaning of a text or email when subtle (or not-so-subtle) cues like body language, tone, inflection, facial expression, and volume are omitted from the equation. Simple, two-word phrases like ‘I know’, ‘Thank you’, or ‘Oh really?’ can take on different meanings dependent upon all the aforementioned factors. When you strip away the element of contextual interpretation only the words remain and it is easy to misinterpret the meaning behind the words themselves. Layer on to that the vast syntactical complexities that exist in languages, and differ between languages, and the barriers seem nearly insurmountable.
In their concise NLP primer; Nadkarni, Ohno-Machado, and Chapman provide a humorous example of the challenges presented by the complexity of machine-based language translation.
“Early simplistic approaches, for example, word-for-word Russian-to-English machine translation, were defeated by homographs—identically spelled words with multiple meanings—and metaphor, leading to the apocryphal story of the Biblical, ‘the spirit is willing, but the flesh is weak’ being translated to ‘the vodka is agreeable, but the meat is spoiled.’”
While this example is one of natural language to natural language translation performed by a computer, it highlights the type of challenges encountered when translating between natural language and machine language. The authors go on to explain some of the syntactical challenges surrounding NLP for medical language. While some of the issues that they describe are unique to medical language, many exist in common language as well.
Sentence boundary detection: For NLP to properly assign meaning to strings of words the computer needs to know where a single idea begins and ends. That seems like a simple enough task. As you read this blog you can see a capitalized letter at the beginning of each sentence and a period, or other delimiting punctuation mark, at the end. If capitalization and periods, question marks, and exclamation points were only used as delimiters, coding computers to parse blocks of text would be simple. What happens when I write about Ms. Jennifer I. Jones, M.D., or quote her as having asked, “How will NLP affect my life?” The challenge is in building processors that allow computers to understand which of the capital letters and punctuation marks signify the beginning and ending of the sentence and which provide other information about the words therein.
POS Tagging: Once a string of words has been identified, another task that an NLP must undertake in order to assign meaning to common language is identifying the parts of speech contained in that string of words. Admit it, sentence diagrams confound you. Beyond subject, noun, verb, adjective, and adverb you’re lost. One might think it would be much simpler to assign rules so that a computer could easily handle this task, but consider the number of homographs, for written language, and homophones for speech. When a sentence contains the word ‘set’ is it a noun or a verb? Consider the word ‘considering’; is it being used as a verb, an adverb, a preposition or a conjunction? In an audio file is the individual saying to, too, or two? Gerunds—verb forms ending in -ing used as nouns—present a similar challenge. (e.g. Running is my favorite exercise. We went dancing.) While clearly not impossible, as evidenced by the advances in voice-to-text and text-to-voice capabilities over the last several years, POS tagging remains challenging.
Respondent’s on a Quora thread provide further insight into the challenges surrounding implementation of NLP.
Entity Recognition: One user identified the challenge in identifying whether ‘George Washington’ refers to a name or a place. I would take that example a step further and use the example of just ‘Washington’. That one word could refer to a state, any number of counties, a city, any number of streets (or boulevards, avenues etc.); it could be a surname of a handful of historical or contemporary figures or it could be part of the name of an airport or sports team. While this challenge is similar to that homographs and homophones, it is an additional layer to deriving meaning. Once the NLP has identified a proper noun it must then determine just who, what, or where that noun references. Add to that complexity the ways in which we give substitute names and slang names to people, places, and things. An NLP will need to be able to discern JLo from J. Lowe and JLaw and J. Law. Does a sentence refer to Jennifer Lawrence or Jude Law? When referring to Ike is the sentence describing a former president, an infamous pop star, or the I-290 freeway into and out of Chicago?
Hardware Limitations: Another Quora user points to limitations in hardware architecture and content-addressable memory as additional barriers to effective NLP. He purports that the complex neural structure of the human brain allows for language analysis that simultaneously considers millions of variables, and that current hardware isn’t arranged to mimic that infrastructure. He further contends that human memory is linked by relationships, so a single word can activate multiple linked concepts. Computers are limited to accessing the particular information associated with an encoded address, and can’t assign the same contextual relationships to unstructured data. Future advances in NLP will need to address challenges such as these.
While we’re on the verge of major advances in NLP that have the potential to revolutionize BI in profound ways, I caution you to have patience and understand the complexities involved translating between machine language and natural language. When you understand exactly what went into the design of new extensions and apps as they rollout you can truly appreciate the magnitude of the technology you’re using, and also be forgiving of early glitches. These challenges and barriers also give hope for future innovations. Programmers and engineers have already solved numerous problems encountered along the path to the current wave of NLP; that ingenuity is far from over. As we build hardware and software that is better equipped to address these challenges, new uses could emerge that we have yet to dream of. One of the greatest challenge with big data is making sense of the vast volumes of information available. How do we find, organize, and analyze only what we need out of an ever growing sea of information? The future of NLP may hold a key that allows us to communicate with our technology in a way that allows us to tap the seemingly infinite potential contained in the dataverse.
What Are You Working On?
Looking for intelligent technological solutions? Seeking consultation on your upcoming projects? Need a quote for services? Contact Aptude’s executive team directly. It’s amazing just how much one little email can rapidly accelerate your productivity.
Gain Time, Increase Currency, Contact Us
It’s amazing how one quick email can change your life. Give us a shout! We’ll get back to you right away with the right person for what you’re looking to accomplish.
What our clients are saying…
Aptude provides onsite and offshore Oracle DBA support, which includes troubleshooting, back-up, recovery, migration, upgrades, and daily maintenance of Oracle database servers. Aptude has been working with our team for the past four years and we continue to use them and are satisfied with their work
Aptude provided Build.com a Java, MySQL, Webservices and other UI based solution in the business domain of analyzing and reporting on user activities for our ecommerce website. Utilizing Omniture’s APIs to download, parse, and regenerate and upload back so that we could be more effective in our marketing. I was satisfied with their project work and delivery and would consider utilizing them for future projects.” Build.com
Aptude provided us with Oracle DBA migration support, including an upgrade from Oracle 11.1 to Oracle 11.2, and the project was completed on time and to specifications. The project manager and project consultants were responsive and proactive, resulting in a successful conclusion to the work. I would definitely contract with them again, and have recommended them to other technical offices at the University of Georgia.
Thank you for the hard work your team has put forth to staff the contract positions at Wolters Kluwer. Aptude has consistently scored high in our supplier carding and even more important you are a vendor we can always trust. I am especially impressed with your ability to tackle our positions that other vendors have not been able to fill.