Analyzing data using Quicksight in AWS, hands-on steps
Hey Readers,
My name is Akshay, and I am working as Senior Developer at Luxoft India. Here in this article I will be providing basic steps to start analyzing datasets using AWS Quicksight. The following work has been taken up at luxoft for our one of the project which required extensive analytical visualization and quicksight served the purpose effectively.
In this article I will be taking through one of the best use case of AWS in field of Business Intelligence for the purpose of analysis. The prerequisite is to understand the theoretical concepts of Quicksight and S3 which can be easily found on AWS website or any other learning platform. In this article we’ll be using two AWS Services, Amazon S3 and Amazon quicksight. We’ll also be using a data set provided by brightdata with a list of 50000 best-selling products on amazon.com.
Following these steps we can use AWS services to create visualizations from large data sets.
- We will start by downloading a dataset of 50000 best-selling Amazon products.
- We will store these data into an Amazon S3 bucket.
- Finally we will connect the S3 bucket with Amazon quicksight to create a few interesting visualizations.
Step 1> Downloading the datasets:
Amazon-BestSeller-Dataset: https://drive.google.com/file/d/1ipEzelWC2sGZ_SavNm0bn0FZ3zj36sGm/view?usp=drive_link
Manifest.JSON: https://drive.google.com/file/d/1xIV2K2H0bSdDyxF2L42eG8W10sX67vkc/view?usp=drive_link
Note: Please comment in comment section to get access to datasets.
Step2> Storing datasets in an Amazon S3 bucket.
In AWS Management console type in S3 once you’re on the S3 console page click on Create bucket. For storing the large dataset we can give any name to it lets say medium Amazon Project, but you can name it anything you’d like keep the rest of the settings as default and select create.
Now click on the bucket you’ve just created and upload the CSV file, we need to select our file and then press upload.
Next We will need to make a small change to our manifest.json file. Open the file and enter your bucket name. Since my bucket name was Medium Amazon Project so I will change it to this name. Just need to change the bucket name section with the name of your own bucket.
Once the Amazon bestseller dataset.csv has been uploaded, we will now have to also upload the manifest.json file. In this bucket now we can see we have our two objects.
Tip: Duplicate the current tab of S3 bucket as you can have a copy of your S3 Bucket.
Step 3> Now we are ready to move on to Amazon quicksight such that we can start visualizing our data we uploaded.
Upon clicking the quicksight on the aws console it will redirect to quicksight account. If you don’t have a quicksight account yet, the page will be asking you to sign up for quicksght, so just click sign up.
On the next page start a free trial of the Enterprise Edition and at the bottom press continue. On the next page cancel the subscription before 30 days to avoid unnecessary charges. Provide the account info like preferred account name and the email address as per your preference.
Moving down a bit make sure to select the S3 bucket that you want to link to quickight. Click on your S3 bucket and then finish, keep the rest of the settings as default and click finish.
Your account will get created in couple of minutes.
Click on go to Amazon quicksight and then you’ll see the quicksight interface. Next we need to import our S3 data set into quicksight. Click on datasets.
Next click on new dataset on the page and then S3. At this point our manifest file comes in picture. Now go back to the S3 tab, click on the manifest.json object and then click, Copy S3 URL at right side top.
Paste the URL on below tab, this provides instructions for quicksight to import the necessary CSV file. Provide the data source name as per your preference. Click connect
Now click visualize
Click on select the interactive sheet option and give it a few minutes for the dataset to finish importing.
Upon finishing importing, the column names would be automatically fetched. Columns like “availability, brand, categories, price and so on”. Now we’re ready to start creating our first visualization.
Scenario 1: Consider a scenario where we want find out from this list of 50000 best-selling Amazon products which brands are the most popular.
Drag the field brand into this graph and you can see that it counts how many times the brand appeared in the dataset. Name this sheet as “most_popular_brands”. Since the graph will be in random order so Sort it in Descending order. Click on Brand in the graph and select Sort in the options. Scrolling to top we can observe that Standard Motor Products appears the most.
Similary if we go back to these fields we can see that we can try any combination to see comparisons between things like price, brand whether it’s available or not, the seller ID and so on.
Feel free to play around to create your own visualizations. Quicksight actually lets you create a lot of different types of graphs like bar charts, donut charts, pie charts and even word clouds as well.
Other Scenarios you can try with same datasets like:
- Compare the different prices of best selling products.
- Analyze the top selling product titles.
- Identify which sellers have the most best selling products.
I hope you would have liked the article and would be helpfull in creating your own visualizations. Please do let me know in case of any queries or concern in comments. Do share your appreciations as well. Thank you for reading.