I am a PhD candidate at MAPi joint Doctoral Program in Computer Science offered by the University of Minho, the University of Aveiro, and the University of Porto. I am also a researcher at the Laboratory of Artificial Intelligence and Decision Support. LIAAD-INESTEC.
My current research interest focus on Natural Language Processing and Machine Learning. I am also interested in creating language resource for low-resource languages (especially Hausa). I received Master’s degree from the University of Manchester, UK, and a Batchelor’s Degree from Bayero University, Kano, Nigeria. I am also a faculty member at the Faculty of Computer Science and Information Technology, Bayero University, Kano-Nigeria. I spend my time reading books and playing table tennis.
PhD in Computer Science, (2018 - 2022)
University of Porto , Portugal
MSc in Computer Science, 2013
University of Manchester , UK
BSc in Computer Science, 2008
Bayero University ,Kano, Nigeria
Responsibilities include:
Data scientist often spent about 80% of data analysis process on cleaning and preparing data1. Worst still, cleaning and preparing the data is an iterative process. Hadley Wickham refer to this process of cleaning and preparing data as data tidying: structuring datasets to facilitate analysis.
With the growing number of literature on movie revenue prediction using machine learning techniques in recent years, a systemic review will help in strengthening the understanding of this research domain. Therefore, this article is aimed at determining the sources of data, the techniques, the features, and the evaluation metrics used in movie revenue prediction. We selected 36 relevant articles based defined inclusion and exclusion criteria. The review analysis found out that US cinema attracted the highest number of publications, followed by the Chinese cinema, Korean cinema, and Indian cinema in that order. We also found out that regression, classification and clustering data mining approaches were used in the reviewed articles, with regression and classification carrying the largest share. Furthermore, we observed that cast, number of screens, and genre, are the most widely used features in movie revenue prediction. We also identified multiple linear regression and support vector machines are the most commonly used prediction algorithms, while mean absolute percentage error, root-mean-square error, and average percentage hit rate are the evaluation metrics used the most. Our review identified some problems and research directions in movie revenue prediction.
Sentiment lexicon plays a vital role in lexicon-based sentiment analysis. The lexicon-based method is often preferred because it leads to more explainable answers in comparison with many machine learning-based methods. But, semantic orientation of a word depends on its domain. Hence, a general-purpose sentiment lexicon may gives sub-optimal performance compare with a domain-specific lexicon. However, it is challenging to manually generate a domain-specific sentiment lexicon for each domain. Still, it is impractical to generate complete sentiment lexicon for a domain from a single corpus. To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Importantly, we propose an incremental approach for updating an existing lexicon to either the same domain or different domain (domain-adaptation). Finally, we discuss how to incorporate sentiment lexicons information in neural models (word embedding) for better performance.
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution.
Sentiment analysis is a relatively new field of study at the intersection of computer science and linguistics that aims to find an opinion expressed in a text. It has received a swell of interest in both academia and industry. This paper provides an overview of the basic approaches for sentiment analysis task: machine learning-based approach and lexicon-based approach. The machine learning approach is based on training models on corpora annotated with polarity information and the lexiconbased approach is based on using sentiment lexicon. Recently, a hybrid approach is employed to leverage the strength of both two approaches
Viterbi algorithm is a maximum likelihood decoding algorithm. It is used to decode convolutional code in several wireless communication systems, including Wi-Fi. The standard Viterbi algorithm gives just one decoded output, which may be correct or incorrect. Incorrect packets are normally discarded thereby necessitating retransmission and hence resulting in considerable energy loss and delay. Some real-time applications such as Voice over Internet Protocol (VoIP) telephony do not tolerate excessive delay. This makes the conventional Viterbi decoding strategy sub-optimal. In this regard, a modified approach, which involves a form of List Viterbi for decoding the convolutional code is investigated. The technique employed combines the bit-error correction capabilities of both the Viterbi algorithm and the Cyclic Redundancy Check (CRC) procedures. It first uses a form of ‘List Viterbi Algorithm’ (LVA), which generates a list of possible decoded output candidates after the trellis search. The CRC check is then used to determine the presence of correct outcome. Results of experiments conducted using simulation shows considerable improvement in bit-error performance when compared to classical approach.