Twitter-Data-Analysis

This Application analysis tweeter data.

Download the dataset from URL

######It will contain Twitter data collected over a short time period and saved in the Tweet JSON format.

Convert all alphabetic characters to their lowercase form.
Use the TweetTokenizer class from nltk.tokenize to tokenize the text.
Remove stopwords using the nltk.corpus.stopwords module.
Remove the following tokens: ‘rt’, ‘via’, and ‘...’
Remove any token that is purely numeric.

For any question that involves the time, you are not required to convert the timestamps to our local time zone. That is, after verifying that the timestamps are all recorded using the same time-zone, you may directly use the timestamps recorded in the dataset without worrying about conversion. Answer the following questions.

How many data records are in your dataset?
How many of the tweets in your dataset were newly created? How many tweets were deleted? For the remainder of the questions in this section, only consider the newly created tweets.
What is the timestamp for the earliest tweet in your dataset? What is the timestamp for the latest tweet in your dataset?
Create and display a frequency distribution table for the number of hashtags contained in each tweet.
Create and display a frequency distribution table for the users mentioned in each tweet. Display the results of this table for the 30 most frequently mentioned users.
Create and display a frequency distribution table for the words used in the text of each tweet. Display the results of this table for the 30 most frequently used words.
Create and display histograms for the previous frequency distribution table using both a standard scale and a log scale. For these histograms, expand the results to include the 1000 most frequently used words. Ensure that the data is sorted in descending order. Instead of labeling the horizontal axis using the actual words, enumerate the data points from 1 to 1000, where 1 corresponds to the most frequently appearing word.

For this final question, only include tweets whose text includes at least one word from the list of the 30 most frequently used words found above. Group these tweets by the time that they were created using bin widths of 1 minute. Create and display a time series using this resampling.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
TwitterAnalysis.ipynb		TwitterAnalysis.ipynb
extract_bz2.py		extract_bz2.py
logScale.png		logScale.png
resample.png		resample.png
standardscale.png		standardscale.png
twitter_analysis		twitter_analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Twitter-Data-Analysis

About

Uh oh!

Releases

Packages

Languages

ketulsuthar/Twitter-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter-Data-Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages