#import the modules/libraries we need
import json
import pandas as pd
#JSON data.
= '''
json_data #paste your JSON data here.
'''
#load JSON data
= json.loads(json_data)
data
#accessing chartColumns and chartData
= data['chartColumns']
chart_columns = data['chartData']
chart_data
#creating a dataframe from chartData
= pd.DataFrame.from_dict(chart_data, orient='index', columns=chart_columns)
df
#exporting dataframe to an Excel file
= 'output_data.xlsx' #change this to your desired filename
excel_filename df.to_excel(excel_filename)
Data scraping from interactive dashboards
APK-Inform is an agribusiness consulting agency in the Commonwealth of Independent States (CIS). They provide an interactive dashboard of the weekly and monthly prices of several agricultural goods, which is available here: https://www.apk-inform.com/en/prices
What if we need the raw data behind the intercative chart to work with? What can we do about it?
Most interactive charts use JavaScript, and the JavaScript code on the webpage will make a request to the server or load a static file, like a JSON
file, containing the data to be visualized. This request can be made using AJAX
(Asynchronous JavaScript and XML) or other web APIs like Fetch
. The location of the JSON
file might be hard-coded in the JavaScript or dynamically specified.
Once the JSON
file is fetched, the JavaScript library parses the JSON
data, which is essentially a text-based data format easily readable by humans and machines, and uses it to generate the chart. This involves converting the data into a format that the charting library can use to plot points, lines, bars, etc., and then rendering them on the page.
First, we specify the parameters for the data we need. I chose all the available goods in the weekly prices section, from 17/02/2020 until 26/02/2024. You can change this to anything else you want.
Then, we check for the JSON
file. First, right-click on the page, and choose Inspect
Then, in the Network
tab, choose Fetch/XHR
to filter out the asynchronous requests that fetch data using either of XHR
or Fetch
. XHR
is a browser API in the form of an object that is used to interact with servers. It can be used to retrieve data from a URL without having to do a full page refresh. This is the traditional way of making asynchronous requests in web applications.
Fetch
is a modern, promise-based API that provides a more powerful and flexible feature set to make asynchronous requests. It is part of the Window
and Worker
GlobalScopes
, and it is recommended for new applications due to its simpler syntax and greater control over requests.
The request was to get get-prices
as shown in the photo. The file contains chartColumns
and chartData
. The later contains the dates and the weekly prices, while the former contains everything else.
Now, we parse the JSON
file data:
Now, the data is ready to work with in Excel. You can export in any file format you desire of course.