Understanding Data Science Terms

by SVGC | Oct 26, 2021

As discussed in our Data Science: An Untapped Resource within the Public Sector blog, data science is the bedrock for innovation and vital for growth and development within the public sector. However, the terms used can be confusing and hard to understand.

In this post we’ve provided a brief introduction to some of the most popular terms used within data science, to help you understand the different approaches and techniques used to extract vital insight from data . If you need any further guidance on understanding data science terms get in touch with our data scientists today.

Correlation Analysis

Correlation analysis studies relationships between a variable of interest and an explanatory variable. For example, a variable of interest could be consumption, while an explanatory variable could be GDP (Gross Domestic Product) per capita. If the relationship proves to be statistically significant, the explanatory variable is said to be related or associated to the variable of interest. Parameters such as r-squared and p-value are used to assess the strength of the relationship. At SVGC, we use Correlation Analysis for identifying sensitivities everything from documents to programme risks, an example of this can be seen in our work with FCDO Services

(Multivariable) Regression Analysis

Regression analysis is a more general form of correlation analysis, where the relationships between one variable of interest and several explanatory variables are measured. For example, the variable of interest could be consumption while the explanatory variables could be GDP per capita, commodity prices, new product launches and so on. Regression analysis helps to understand how changes in explanatory variables affect the variable of interest. It is widely used for predictions and forecasts.

Data Science

Multivariate Regression

Multivariate regressions analysis studies the relationship between several variables of interest against several explanatory variables. For example, the variables of interest could be consumption of various items, while the explanatory variables could be GDP per capita, commodity pieces, new products launches, population demographics and so on. Multivariate regression analysis helps to understand how different changes in explanatory variables affect the variables of interest.

At SVGC, we use Regression Analysis for our work on Net Assessment of international defence and security capabilities.

Supply Chain Inventory Modelling

Supply chain inventory modelling is a method of data analysis used to quantify the impact of item characteristics on item consumption. In simple terms, supply chain inventory modelling gives weights to different factors that affect item consumptions. The weights can be determined using, for example, multivariable regression modelling.

Decision Tree Analysis

Decision analysis is a general name given to techniques that analyse every possible outcome of a decision. A decision tree is a diagram that visualises the outcomes and can be easily interpreted. They can help understand and evaluate risks and uncertainties. They also can help answer questions such as: What are the factors that affect the consumption of an item the most? Can we predict an outcome having made a change? At SVGC, we use Decision Tree Analysis for our work on Net Assessment of international defence and security capabilities.

Data Science

CHAID Analysis (Chi Squared Automatic Interaction Detector)

CHAID is a type of a decision tree algorithm that determines relationships between the variable of interest (for example, the number of demands of a particular item) and the independent variables (for example, certain characteristics- environment, installation vehicle and usage). CHAID automatically creates the decision tree based on the trends and patterns within the data. It can then help understand an outcome having made a change to something and is often used for item segmentation.

Cluster Analysis

Cluster analysis is an exploratory data analysis method that helps identify meaningful structures within data. It defines areas/groups/segments of data that share similarities across several measures. In the marketing industry, cluster analysis is often used to identify item segments. CHAID is also often used for item segmentation, but is a very different algorithm to cluster analysis. Cluster analysis treats all the variables in the data uniformly, while CHAID analysis recognises the variable of interest and independent variables and treats them differently. At SVGC, we use Cluster Analysis to help us to structure unstructured data for Big Data projects.

Bayesian Modelling

There are two different statistical approaches to gaining insights from data: frequentist (or classical) and Bayesian. The frequentist approach builds a model based only on the data observed, while the Bayesian approach allows some subjective beliefs about the model to be incorporated with the observations. At SVGC, we use Bayesian modelling to optimise Operational Management for our clients by predicting the best allocation of tasks, enhanced by local knowledge. Evidence of this work can be seen in our work with the DNO.

Prediction Interval/ Confidence Interval

A confidence interval is a range of values that is likely to contain an unknown value of a variable. A prediction interval is a type of confidence interval that can be used for values that are yet to be observed. For example, the level of demand represents a variable of interest. If we know from experience that the level of demand is never the forecast demand of 100, but plus or minus 15 95% of the time- then we would say that we are 95% confident that the demand level is between 85 and 115.

Machine Learning

Machine learning is a form of Artificial Intelligence (AI) and is a method of data analysis that iteratively ‘learns’ from data as it arrives without human intervention. Machine learning can analyse large amounts of data quickly to enable smarter decisions in real time and deliver insights into complex behaviours. An extension of machine learning is Deep Learning which combines a number of machine learning algorithms for more advanced computer models. At SVGC, we use Machine Learning for a variety of tasks including our Digital Sensitivity Review projects with FCDO Services.

Big Data

The definition of Big Data is commonly described by the three V’s: Volume, Velocity and Variety. The following datasets could be considered Big Data: vehicle usage data at the point of use, item demand patterns across all held inventory, or social media data. These types of data can help deliver insights that allow businesses to react to their issues in real-time (eg. data strategies, supply chain adjustments). Big Data requires new technologies, such as Hadoop and Spark, for storage and processing. At SVGC, we use Big Data analysis techniques for challenges including topic modelling and identification of similarity. Evidence of this work can be seen in our work with FCDO Services.

More news:

SVGC Recognized Among Top 200 Fastest-Growing Technology Companies in the UK Public Sector

SVGC|Feb 2, 2024

We are thrilled to announce that SVGC recently received the honour of being ranked 98 in the top 200 fastest-growing...

Larkhill Eventing Water Complex Complete with SVGC’s Support

SVGC|Feb 2, 2024

This week Mark Milligan, Head of Delivery, visited Larkhill Eventing to celebrate the completion of the SVGC water...

Ryan Cronk | Interview

SVGC|Jan 25, 2024

At SVGC we’re proud to be a small business formed of experienced, highly qualified people operating on a national...

« Older Entries

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
apbct_headless	session	Cleantalk set this cookie to detect spam and improve the website's security.
apbct_page_hits	never	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
apbct_pixel_url	session	Clean Talk sets this cookie to make WordPress anti-spam cookies, e.g., spam on forms and comments.
apbct_site_referer	3 days	CleanTalk Spam Protect sets this cookie to prevent spam and to store the referrer page address which led the user to the website.
apbct_urls	3 days	CleanTalk Spam Protect sets this cookie to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
ct_has_scrolled	session	CleanTalk sets this cookie to store dynamic variables from the browser.
ct_pointer_data	session	CleanTalk sets this cookie to prevent spam on the site's comments/forms, and to act as a complete anti-spam solution and firewall for the site.
ct_timezone	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
ct_checked_emails	session	Clean Talk sets this cookie to prevent spam on the site's comments or forms.
ct_checkjs	session	Clean Talk sets this cookie to prevent spam on the site's comments or forms.
ct_fkp_timestamp	session	Clean Talk sets this cookie to prevent spam on the site's comments or forms.
ct_ps_timestamp	session	Clean Talk sets this cookie to prevent spam on the site's comments or forms.

Performance

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
ct_screen_info	session	CleanTalk sets this cookie to complete an anti-spam solution and firewall for the website, preventing spam from appearing in comments and forms.

Others