Data scraping from interactive dashboards

APK-Inform is an agribusiness consulting agency in the Commonwealth of Independent States (CIS). They provide an interactive dashboard of the weekly and monthly prices of several agricultural goods, which is available here: https://www.apk-inform.com/en/prices

What if we need the raw data behind the intercative chart to work with? What can we do about it?

Most interactive charts use JavaScript, and the JavaScript code on the webpage will make a request to the server or load a static file, like a JSON file, containing the data to be visualized. This request can be made using AJAX (Asynchronous JavaScript and XML) or other web APIs like Fetch. The location of the JSON file might be hard-coded in the JavaScript or dynamically specified.

Once the JSON file is fetched, the JavaScript library parses the JSON data, which is essentially a text-based data format easily readable by humans and machines, and uses it to generate the chart. This involves converting the data into a format that the charting library can use to plot points, lines, bars, etc., and then rendering them on the page.

First, we specify the parameters for the data we need. I chose all the available goods in the weekly prices section, from 17/02/2020 until 26/02/2024. You can change this to anything else you want.

Then, we check for the JSON file. First, right-click on the page, and choose Inspect

Then, in the Network tab, choose Fetch/XHR to filter out the asynchronous requests that fetch data using either of XHR or Fetch. XHR is a browser API in the form of an object that is used to interact with servers. It can be used to retrieve data from a URL without having to do a full page refresh. This is the traditional way of making asynchronous requests in web applications.

Fetch is a modern, promise-based API that provides a more powerful and flexible feature set to make asynchronous requests. It is part of the Window and Worker GlobalScopes, and it is recommended for new applications due to its simpler syntax and greater control over requests.

The request was to get get-prices as shown in the photo. The file contains chartColumnsand chartData. The later contains the dates and the weekly prices, while the former contains everything else.

Now, we parse the JSON file data:

#import the modules/libraries we need
import json
import pandas as pd

#JSON data. 
json_data = '''
    #paste your JSON data here.
'''

#load JSON data
data = json.loads(json_data)

#accessing chartColumns and chartData
chart_columns = data['chartColumns']
chart_data = data['chartData']

#creating a dataframe from chartData
df = pd.DataFrame.from_dict(chart_data, orient='index', columns=chart_columns)

#exporting dataframe to an Excel file
excel_filename = 'output_data.xlsx'  #change this to your desired filename
df.to_excel(excel_filename)

Now, the data is ready to work with in Excel. You can export in any file format you desire of course.