-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Analysis on Startups
Scrape, Analyze and Visualize insights on Startups using Watson Studio
- Smruthi Raj Mohan [email protected]
- Srikanth Manne [email protected]
- Video URL
- Demo URL
Being in the age of start-ups. There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals. For example, if we identify a startup that provides ML services in the domain of healthcare, we want to see if they managed to create some noise and appear in a few articles on a popular tech and business portal like Economic Times.
Our application aims to provide a tool that will extract live unstructured data about companies and their impact in the industry with the help of Watson Natural Language Understanding, fed into IBM SPSS Predictive Analytics to get meaningful insights and predictions, finally fed into Cognos Dashboard which provides insights and visualisation from the provided input.
- Analytics
- Artificial Intelligence
Tell the story of your code pattern: describe the problem and who might encounter it; why is your pattern the right way to overcome the challenge? Highlight interesting code features and wherever possible, describe real-world situations where a developer will benefit from using the pattern. DO NOT include detailed technical steps, instructions, and commands; they will be provided in the readme file for your code.
Write 3-4 paragraphs.
There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals. For example, if we identify a startup that provides ML services in the domain of healthcare, we want to see if they managed to create some noise and appear in a few articles on a popular tech and business portal like Economic Times.
Our application aims to provide a tool that will extract live unstructured data about companies and their impact in the industry with the help of Watson Natural Language Understanding, fed into IBM SPSS Predictive Analytics to get meaningful insights and predictions, finally fed into Cognos Dashboard which provides insights and visualisation from the provided input.
The Output provides the following views-
- Company's Score based on Relevance: A view showing the most popular companies at a larger size than the smaller ones.
- Total number of articles appeared in the web of a Company: A view showing the factors affecting the popularity of a startup on the web (amongst News Articles, Tech Blogs, Socail Media and so on).
- News Concept Relevance: Gives a broad overview of main topics of the articles across the companies, by the percentage of its Relevance.
- News Sentiment Analysis by Company: Gives an overall analysis of the tone in which the article was written, to understand the impact (whether positive or negative or neutral) a given company has in the industry.
- The user creates and runs a Python Notebook on Watson Studio.
- The Notebook scrapes latest news on Startups.
- The Scraped Information is sent to Watson Natural Language Understanding to extract Keywords, Entities, Sentiments and its respective confidence scores.
- The Results of NLU are compiled into a csv file which is further converted to a table in DB2 Warehouse.
- The table created is ingested in SPSS to do some analytics and return a score against each company. The updated table is then saved back to DB2 Warehouse.
- Finally, Cognos ingests, the final table generated in DB2 Warehouse giving insightful visualisation.
- Data Preparation
- Analyze using SPSS
- Visualize in Cognos
List all components and services that play a prominent role in your pattern. Components are IBM products, any open source project, or solutions that are NOT IBM Cloud Services. Services are services available in the IBM Cloud (public) Catalog.
To view all components see http://developer.ibm.com/components.
To view services, see https://console.bluemix.net/catalog/
- IBM SPSS Modeler
- IBM Cognos Dashboard
- IBM Watson Studio
- IBM DB2 Warehouse
- Cloud Object Storage
List any IBM Developer resources that are closely related to this pattern, such as other patterns, blog posts, tutorials, etc..
Provide any non-IBM Developer resources that you need to link to that are NOT components or services
Every pattern must have an announcement post that introduces it. The announcement should explain why the pattern is important or useful. The announcement is an invitation to try the pattern; you can expand on why you created the pattern, discuss any challenges that you overcame, or expand on the technologies that you're using.
Being in the age of start-ups. There is a rapid increase in a number of companies providing skilled services. We can scrape information about such companies and evaluate their success stories based on the number of articles or live use cases appeared in news portals.
Suppose, we want to understand the current startups in a particular technology, say Machine Learning, this code pattern will evaluate its impact in the industry, on the basis of-
- How many times it has appeared on News?
- Whether it has a Wikipedia page or not?
- Whether they have Tech blogs or not?
- Whether they are active on Social Media (Twitter, Medium, etc..)?
This unstructured data once scraped(extract information from web) is processed through Watson NLU and converted to structured data. This is fed to SPSS, which can be used to understand the data and perform Analytics to determine if all the factors(as mentioned above) appear in a company, thereby computing a popularity score. Once, all the Analytics is performed this Code Pattern also provides a user friendly and interactive Dashboard visualisation of the data, giving insights of the data and complete ease to simplify the decision making process.