Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.
smruthi33 edited this page Mar 13, 2019 · 4 revisions

Short title

Analysis on Startups

Long title

Scrape, Analyze and Visualize insights on Startups using Watson Studio

Author

URLs

Github repo

Other URLs

  • Video URL
  • Demo URL

Summary

Being in the age of start-ups. There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals. For example, if we identify a startup that provides ML services in the domain of healthcare, we want to see if they managed to create some noise and appear in a few articles on a popular tech and business portal like Economic Times.

Our application aims to provide a tool that will extract live unstructured data about companies and their impact in the industry with the help of Watson Natural Language Understanding, fed into IBM SPSS Predictive Analytics to get meaningful insights and predictions, finally fed into Cognos Dashboard which provides insights and visualisation from the provided input.

Technologies

  • Analytics
  • Artificial Intelligence

Description

Tell the story of your code pattern: describe the problem and who might encounter it; why is your pattern the right way to overcome the challenge? Highlight interesting code features and wherever possible, describe real-world situations where a developer will benefit from using the pattern. DO NOT include detailed technical steps, instructions, and commands; they will be provided in the readme file for your code.

Write 3-4 paragraphs.

There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals. For example, if we identify a startup that provides ML services in the domain of healthcare, we want to see if they managed to create some noise and appear in a few articles on a popular tech and business portal like Economic Times.

Our application aims to provide a tool that will extract live unstructured data about companies and their impact in the industry with the help of Watson Natural Language Understanding, fed into IBM SPSS Predictive Analytics to get meaningful insights and predictions, finally fed into Cognos Dashboard which provides insights and visualisation from the provided input.

The Output provides the following views-

  • Company's Score based on Relevance: A view showing the most popular companies at a larger size than the smaller ones.
  • Total number of articles appeared in the web of a Company: A view showing the factors affecting the popularity of a startup on the web (amongst News Articles, Tech Blogs, Socail Media and so on).
  • News Concept Relevance: Gives a broad overview of main topics of the articles across the companies, by the percentage of its Relevance.
  • News Sentiment Analysis by Company: Gives an overall analysis of the tone in which the article was written, to understand the impact (whether positive or negative or neutral) a given company has in the industry.

Flow

  1. The user creates and runs a Python Notebook on Watson Studio.
  2. The Notebook scrapes latest news on Startups.
  3. The Scraped Information is sent to Watson Natural Language Understanding to extract Keywords, Entities, Sentiments and its respective confidence scores.
  4. The Results of NLU are compiled into a csv file which is further converted to a table in DB2 Warehouse.
  5. The table created is ingested in SPSS to do some analytics and return a score against each company. The updated table is then saved back to DB2 Warehouse.
  6. Finally, Cognos ingests, the final table generated in DB2 Warehouse giving insightful visualisation.

Instructions

  1. Data Preparation
  2. Analyze using SPSS
  3. Visualize in Cognos

Components and services

List all components and services that play a prominent role in your pattern. Components are IBM products, any open source project, or solutions that are NOT IBM Cloud Services. Services are services available in the IBM Cloud (public) Catalog.

To view all components see http://developer.ibm.com/components.

To view services, see https://console.bluemix.net/catalog/

  • IBM SPSS Modeler
  • IBM Cognos Dashboard
  • IBM Watson Studio
  • IBM DB2 Warehouse
  • Cloud Object Storage

Related IBM Developer content

List any IBM Developer resources that are closely related to this pattern, such as other patterns, blog posts, tutorials, etc..

Related links

Provide any non-IBM Developer resources that you need to link to that are NOT components or services

Announcement

Every pattern must have an announcement post that introduces it. The announcement should explain why the pattern is important or useful. The announcement is an invitation to try the pattern; you can expand on why you created the pattern, discuss any challenges that you overcame, or expand on the technologies that you're using.

Being in the age of start-ups. There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals.

Suppose, we want to understand the current startups in a particular technology, say Machine Learning, this code pattern will evaluate its impact in the industry, on the basis of-

  • How many times it has appeared on News?
  • Whether it has a Wikipedia page or not?
  • Whether they have Tech blogs or not?
  • Whether they are active on Social Media (Twitter, Medium, etc..)?

This unstructured data once scraped(extract information from web) is processed through Watson NLU and converted to structured data. This is fed to SPSS, which can be used to understand the data and perform Analytics to determine if all the factors(as mentioned above) appear in a company, thereby computing a popularity score. Once, all the Analytics is performed this Code Pattern also provides a user friendly and interactive Dashboard visualisation of the data, giving insights of the data and complete ease to simplify the decision making process.

Clone this wiki locally