Natural Language Processing

The Patient Voice in the NHS


To develop a full understanding of the meaning of patient feedback as provided in the form of recorded comments, suggestions, and complaints. Rather than NHS colleagues attempting to read, understand and report on hundreds if not thousands of such comments, we are using Artificial Intelligence (AI) methods along with Microsoft Power BI to automate the process. 



Our solution requires a detailed understanding of both the general environment and specific environments within which the analysis will be conducted. For example, the general environment might be Mental Health with the specific environments being Learning Difficulties, Eating Disorders, Autistic Spectrum Disorders etc. Ideally, clients will provide the feedback in the form of either Word documents or CSV files in English. If English is not practical, then we can use the Natural Language Toolkit to translate feedback provided in other languages into English before we start the process.



As will be appreciated, different people will use different forms of the language to explain their experiences and concerns. To simplify the various comments our first step is to identify and remove unnecessary words. Although this results in a stilted form of language and is stiff and unnatural to humans it is much easier for a computer program to understand.



Next, we need to summarise the feedback document into different sentences so that we can clearly understand what each sentence means. We break down the feedback document into separate individual sentences. Then we can identify the main words which together represent the meanings of the sentences. In addition to understanding the sentences we also need to summarise all of the words used in the feedback document. So we breakdown the entire document into the individual words. This allows us to examine the words used in terms of what they are telling us. Where the issues raised exist, and whether it suggests a positive or negative issue.



At this stage we have number of documents derived from the original feedback document. There are various types as generated by the different AI algorithms providing text datasets to facilitate our range of modelling and analysis. Then we run a process aimed at standardising the sentences and words. We covert all words to a standard version of the word where appropriate. So fine, better, excellent and good become good etc.. 



Using our various input text datasets with appropriate AI routines, we identify and define the specific characteristics of each sentence to clearly understand the main point or meaning of the sentence.



Again, using our various AI derived input text datasets, we now look to build a selection of features that will help us filter the various sentences and words in accordance with a particular theme or organisation or location. This will facilitate the SIA methodology requiring the design of a single Power BI Model to accommodate issues relating to the general environment, Mental Health as well as those relating to the specific mental health conditions contained within the model.



Given the understanding developed so far, we are now in a position to define the various segments of interest as identified through the data, theme always being the most important element.



To facilitate the analysis, we need to identify and define the various themes contained within the data. Also, the other filters necessary to provide the level of analysis required. These are usually demographics, locations, job roles, people, professions, specialties etc.

Given the various filters and themes, we next look to perform a sentiment analysis on the finalised list of sentences whilst maintaining a link with the relevant filters.

Obviously, the sentiment analysis only applies to the sentence but by using the filters we can select only the sentences which apply to any given filter including themes. Similarly, we can produce a master word cloud which can also be modified in accordance with the application of any given filter. 

In the more advanced versions of this methodology, we can extend our analysis to include a more advanced form of analysis creating a range of AI created extracts and interpretations from the original feedback document. These can also be visualised using Power BI’s text Analytics and visualisation functionality.



Although not perfect, our Power BI model will provide a good understanding of what is going on in the organisation from their patient’s feedback, along with what the main issues are, and whether they are good or bad. It will also provide this intelligence against an agreed time frame, allowing the tracking of issues over time. It allows the organisation to identify and share best practice, and where there are problems, they can be addressed and resolved. After all, when patients and family members go to the trouble of sending in their complaints and comments this is what they expect.



Natural Language Processing Techniques:


Tokenization is breaking raw text into small chunks. Tokenization breaks the raw text into words and/or sentences called tokens. These tokens help in understanding the context or developing the model for the NLP. 



Stop words are a set of commonly used words in a language. Examples of stop words in English are “a,” “the,” “is,” “are,” etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so widely used that they carry very little useful information.



Lemmatization in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. For  instance the word "better" has "good" as its lemma.

Tokenization: Breaks text into individual words or sentences.

Stemming: Reduces words to their root or base form.

Lemmatization: Similar to stemming but reduces words to their lemma or dictionary form.


Part-of-Speech (POS) 

Part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified method in the identification of words as nounsverbsadjectivesadverbs, etc.



Chunking is the process of natural language processing used to identify parts of speech and short phrases present in a given sentence. NLP Chunks can be practical – foods within a group such as proteins, cars that are similar in some way like hatchbacks, consumers of certain products, and so on. Chunks can also be based on our personal associations – all the people I know who also like cake.


Named Entity Recognition (NER):

NER is a sub-task of information extraction in Natural Language Processing (NLP) that classifies named entities into predefined categories such as person names, organisations, locations, medical codes, time expressions, quantities, monetary values, and more.



A natural language parser is a program that figures out which group of words go together (as “phrases”) and which words are the subject or object of a verb. The NLP parser separates a series of text into smaller pieces based on the grammar rules. If a sentence that cannot be parsed may have grammatical errors.


Word Sense Disambiguation (WSD):

Word Sense Disambiguation (WSD) is a subfield of Natural Language Processing (NLP) that deals with determining the intended meaning of a word in a given context. It is the process of identifying the correct sense of a word from a set of possible senses, based on the context in which the word appears.


© Copyright. All rights reserved.

We need your consent to load the translations

We use a third-party service to translate the website content that may collect data about your activity. Please review the details in the privacy policy and accept the service to view the translations.