A Novel Approach to Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

Authors

  • Mujtaba Husnain Department of Computer Science & IT, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan.
  • Malik Muhammad Saad Missen Department of Computer Science & IT, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan.
  • Shahzad Mumtaz Department of Computer Science & IT, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan.
  • Muhammad Muzzamil Luqman L3i, La Rochelle University, Avenue Michel C´repeau, 17000 La Rochelle, France.
  • Mickael Coustaty L3i, La Rochelle University, Avenue Michel C´repeau, 17000 La Rochelle, France.
  • Jean-Marc Ogier L3i, La Rochelle University, Avenue Michel C´repeau, 17000 La Rochelle, France.

DOI:

https://doi.org/10.9734/bpi/ctmcs/v2/2338F

Keywords:

Dimension reduction, multidimensional information visualization, Euclidean distance, embedding algorithms, pattern classification

Abstract

We applied t-distributed stochastic neighbor embedding ( t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28  images ofhandwritten Urdu numerals. The data set was created by inviting authors from diff erent categories ofnative Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pair wise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high- dimensional data while mapping to a low-dimensional plane. The novelty of our approach lies in the fact that we embed Euclidean distances in standard t-SNE in order to successfully visualize the high-dimensional data represented in multiple independent observations.  The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.

Published

2021-06-12

How to Cite

Mujtaba Husnain, Malik Muhammad Saad Missen, Shahzad Mumtaz, Muhammad Muzzamil Luqman, Mickael Coustaty, & Jean-Marc Ogier. (2021). A Novel Approach to Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE . Current Topics on Mathematics and Computer Science Vol. 2, 89–108. https://doi.org/10.9734/bpi/ctmcs/v2/2338F