Natural Language Processing
Mohammad Javad Nasri-Lowshani; Javad Salimi Sartakhti; Hossein Ebrahimpour-Komole
Abstract
Background and Objectives: Developing efficient task-oriented dialogue systems capable of handling multilingual interactions is a growing area of research in natural language processing (NLP). In this paper, we propose SenSimpleDS, a deep reinforcement learning-based joint task-oriented dialogue system, ...
Read More
Background and Objectives: Developing efficient task-oriented dialogue systems capable of handling multilingual interactions is a growing area of research in natural language processing (NLP). In this paper, we propose SenSimpleDS, a deep reinforcement learning-based joint task-oriented dialogue system, designed for multilingual conversations.Methods: The system utilizes a deep Q-network and the SBERT model to represent the dialogue environment. We introduce two variants, SenSimpleDS+ and SenSimpleDS-NSP, which incorporate modifications in the ε-greedy method and leverage next sequence prediction (NSP) using BERT to refine the reward function. These methods are evaluated on datasets in English, Persian, Spanish, and German, and compared with baseline methods such as SimpleDS and SCGSimpleDS.Results: Our experimental results demonstrate that the proposed methods outperform the baselines in terms of average collected rewards, requiring fewer learning steps to achieve optimal dialogue policies. Notably, the incorporation of NSP significantly improves performance by optimizing reward collection. The multilingual SenSimpleDS further showcases the system’s ability to function across languages using a random forest classifier for language detection and MPNet for environment construction. In addition to system evaluations, we introduce a new Persian dataset for task-oriented dialogue in the restaurant domain, expanding the resources available for developing dialogue systems in low-resource languages.Conclusion: SenSimpleDS, a deep reinforcement learning-based joint task-oriented dialogue system, demonstrates superior performance over baseline methods by leveraging deep Q-networks, SBERT. The integration of next sequence prediction (NSP) significantly enhances reward optimization, enabling faster convergence to optimal dialogue policies. This work establishes a foundation for future research in multilingual dialogue systems, with potential applications across diverse service domains.
Artificial Intelligence
Mahdi Shahbazi Khojasteh; Armin Salimi Badr
Abstract
Background and Objectives: Unmanned Aerial Vehicles (UAVs) face significant challenges in navigating narrow passages within GPS-denied environments due to sensor and computational limitations. While deep reinforcement learning (DRL) has improved navigation, many methods rely on costly sensors like depth ...
Read More
Background and Objectives: Unmanned Aerial Vehicles (UAVs) face significant challenges in navigating narrow passages within GPS-denied environments due to sensor and computational limitations. While deep reinforcement learning (DRL) has improved navigation, many methods rely on costly sensors like depth cameras or LiDAR. This study addresses these issues using a vision-based DRL framework with a monocular camera for autonomous UAV navigation.Methods: We propose a DRL-based navigation system utilizing Proximal Policy Optimization (PPO). The system processes a stack of grayscale monocular images to capture short-term temporal dependencies, approximating the partially observable environment. A custom reward function encourages trajectory optimization by assigning higher rewards for staying near the passage center while penalizing further distances. The navigation system is evaluated in a 3D simulation environment under a GPS-denied scenario.Results: The proposed method achieves a high success rate, surpassing 97% in challenging narrow passages. The system demonstrates superior learning efficiency and robust generalization to new configurations compared to baseline methods. Notably, using stacked frames mitigates computational overhead while maintaining policy effectiveness.Conclusion: Our vision-based DRL approach enables autonomous UAV navigation in GPS-denied environments with reduced sensor requirements, offering a cost-effective and efficient solution. The findings highlight the potential of monocular cameras paired with DRL for real-world UAV applications such as search and rescue and infrastructure inspection. Future work will extend the framework to obstacle avoidance and general trajectory planning in dynamic environments.
Machine Learning
S. Khonsha; M. A. Sarram; R. Sheikhpour
Abstract
Background and Objectives: Stock recommender system (SRS) based on deep reinforcement learning (DRL) has garnered significant attention within the financial research community. A robust DRL agent aims to consistently allocate some amount of cash to the combination of high-risk and low-risk ...
Read More
Background and Objectives: Stock recommender system (SRS) based on deep reinforcement learning (DRL) has garnered significant attention within the financial research community. A robust DRL agent aims to consistently allocate some amount of cash to the combination of high-risk and low-risk stocks with the ultimate objective of maximizing returns and balancing risk. However, existing DRL-based SRSs focus on one or, at most, two sequential trading agents that operate within the same or shared environment, and often make mistakes in volatile or variable market conditions. In this paper, a robust Concurrent Multiagent Deep Reinforcement Learning-based Stock Recommender System (CMSRS) is proposed.Methods: The proposed system introduces a multi-layered architecture that includes feature extraction at the data layer to construct multiple trading environments, so that different feed DRL agents would robustly recommend assets for trading layer. The proposed CMSRS uses a variety of data sources, including Google stock trends, fundamental data and technical indicators along with historical price data, for the selection and recommendation suitable stocks to buy or sell concurrently by multiple agents. To optimize hyperparameters during the validation phase, we employ Sharpe ratio as a risk adjusted return measure. Additionally, we address liquidity requirements by defining a precise reward function that dynamically manages cash reserves. We also penalize the model for failing to maintain a reserve of cash.Results: The empirical results on the real U.S. stock market data show the superiority of our CMSRS, especially in volatile markets and out-of-sample data.Conclusion: The proposed CMSRS demonstrates significant advancements in stock recommendation by effectively leveraging multiple trading agents and diverse data sources. The empirical results underscore its robustness and superior performance, particularly in volatile market conditions. This multi-layered approach not only optimizes returns but also efficiently manages risks and liquidity, offering a compelling solution for dynamic and uncertain financial environments. Future work could further refine the model's adaptability to other market conditions and explore its applicability across different asset classes.