Eric (Yunjae) Soderquist

Building @ Speakease AI | Data Science & Analytics Intern @ AGCO | Brain & Cognitive Science @ Illinois

LinkedIn GitHub Email

Visit Speakease AI

About Me

Innovative Software Engineer, Data Scientist, and AI Specialist with a focus on:

Large Language Models (LLMs)
Cloud Infrastructure
Real-time AI Solutions

Proven expertise in both technical programming and strategic AI development, including:

Data Pipeline Orchestration
Dataset Curation
RLHF Fine-tuning
Prompt Engineering

Spearheaded high-impact projects, from automating AWS cloud deployments and web scraping solutions to building scalable AI-powered platforms for cross-linguistic communication.

Adept at translating complex neural architectures into real-world applications, such as BCI systems and multimodal translation models, with a passion for creating user-centric, data-driven solutions that bridge technology and human interaction.

Experience

Founder, CEO

Speakease AI • Jan 2023 - Present

Founded Speakease AI to develop a free, AI-driven platform providing real-time voice translation across 50+ languages, addressing the need for accessible cross-linguistic communication tools.
Led all coding and machine learning development, creating a custom multimodal language model utilizing Reinforcement Learning from Human Feedback (RLHF) and divergence training, resulting in high-quality, context-aware translations.
Architected and maintained cloud-native infrastructure using TypeScript, Next.js, Supabase, and Azure; deployed models on Azure OpenAI service and secured cloud funding through a Microsoft partnership.
Implemented low-latency, real-time interactions by leveraging Azure's infrastructure to ensure high availability and scalability for seamless user experiences.
Managed end-to-end engineering, data science, and data engineering efforts, from model training and fine-tuning to infrastructure optimization, enabling translations adaptive to sociocultural and emotional contexts.
Achieved over 1,000+ active users, providing accessible, real-time translation support to immigrants, ESL learners, travelers, and language learners.
Pioneered innovations in AI translation using advanced machine learning techniques, creating a scalable, cloud-native solution that democratizes access to high-quality language tools.

Generative AI

Natural Language Processing (NLP)

Large Language Models (LLM)

Machine Learning

Neural Machine Translation (NMT)

Data Engineering

View Speakease AI's Website

Data Science & Analytics Intern

AGCO Corporation • Jan 2024 - Present

Provided technical solutions to streamline operations, addressing challenges in cloud permissions management, competitive data acquisition, and warranty data processing due to large datasets incompatible with Excel.
Developed an AWS Permissions Automation system, transforming an 8-hour daily manual task into a 1-second automated process by creating an optimized script leveraging AWS API integrations, significantly enhancing operational productivity and cloud security management.
Engineered an Automated Competitive Data Acquisition platform, building a scalable, distributed web-crawling system that reduced data acquisition time from months to approximately 2 hours and 30 minutes. Overcame obstacles like IP blocks, geolocation restrictions, rate limiting, and device tracking to provide AGCO with real-time insights for data-driven decision-making.
Created a Warranty Data Processing Tool, designing and implementing a GUI-based internal application to compare millions of warranty serial numbers and equipment IDs. This innovation turned a two-week manual process into a 1-minute, 30-second automated solution, optimizing workflow efficiency and eliminating delays.
Enhanced CRM functionalities by integrating Salesforce customizations for a Yanzhou, China dealer warranty project, improving market understanding through advanced CRM insights and aiding strategic improvements in dealer support and customer relationships.
Optimized ETL pipelines in AWS/Azure, deploying, automating, and optimizing data pipelines with outputs stored in S3 buckets and visualized via a VistaMap frontend, providing comprehensive and up-to-date data visualizations.
Contributed to significant operational cost savings and efficiency gains, enabling faster, data-driven decisions and improving market coverage strategies through frequent, cost-effective updates at no additional expense.

Web Scraping

Amazon Web Services (AWS)

Automation

Data Engineering

Python (Programming Language)

Artificial Intelligence (AI)

Large Language Models (LLM)

Microsoft Azure

Java

Extract, Transform, Load (ETL)

View AGCO Corporation's Website

Data Scientist

Scale AI • Jan 2024 - Jun 2024

Contributed to the Google PaLM 2 (Bard) → Gemini transition team, aiming to dramatically improve the Gemini model's performance on key benchmarks such as MMLU and HumanEval, which assess language understanding and coding abilities.
Enhanced Gemini Ultra to outperform PaLM 2, focusing on significantly improving multilingual and coding task capabilities as measured by industry benchmarks.
Led the implementation of divergence-based RLHF (Reinforcement Learning from Human Feedback) to refine model alignment for both language and coding tasks.
Collaborated on fine-tuning Gemini Ultra, optimizing performance in real-world tasks including multilingual translation and complex coding challenges.
Conducted extensive benchmarking and optimization, specifically targeting improvements on MMLU and HumanEval.
Elevated MMLU performance from 78% (PaLM 2) to 90.04% with Gemini Ultra, showcasing significant advancements in language understanding.
Boosted HumanEval benchmark from 37.6% to 74.4%, positioning Gemini Ultra as a leading model in coding and language tasks.

Python (Programming Language)

Artificial Intelligence (AI)

Neural Machine Translation (NMT)

Generative AI

Large Language Models (LLM)

Data Science

Prompt Engineering

View Scale AI's Website

Software Engineer

Scale AI • Jan 2023 - Dec 2023

Contributed to the OpenAI ChatGPT team, focusing on improving infrastructure, data integrity, and preprocessing pipelines to ensure a smooth transition from GPT-3 to GPT-4.
Led the development and optimization of large-scale data preprocessing pipelines, achieving a 35% increase in data processing efficiency and accelerating GPT-4's training, making it the most advanced language model at the time.
Developed validation and sanitization protocols to enhance data quality, reducing model hallucinations and improving overall reliability.
Assisted in the systematic transition of infrastructure from GPT-3 to GPT-4, leveraging improved Azure-based AI infrastructure to handle complex, real-world tasks like reasoning and coding.
Integrated feedback mechanisms from Reinforcement Learning from Human Feedback (RLHF) to fine-tune GPT-4's responses, reducing harmful outputs by 82% compared to GPT-3.5 and improving factual accuracy by 40%.
Enhanced model safety and alignment features, ensuring GPT-4 was 82% less likely to produce disallowed content and improved performance on key benchmarks like TruthfulQA and adversarial factuality evaluations.
Supported GPT-4's deployment across applications, including ChatGPT Plus, API integrations, and partnerships with Duolingo and Be My Eyes, delivering safer and more useful responses to users.

Natural Language Processing (NLP)

Front-End Development

Python (Programming Language)

Artificial Intelligence (AI)

Neural Machine Translation (NMT)

Generative AI

Programming

Back-End Web Development

Large Language Models (LLM)

DevOps

Prompt Engineering

View Scale AI's Website

Research Assistant, Machine Learning & Brain Computer Interfaces

University of Illinois, Beckman Institute for Advanced Science and Technology, Cognitive Neuroimaging Lab • Jan 2022 - Dec 2022 · 1 yr

Developed deep learning architectures for multi-modal neuroimaging fusion (MRI/EEG/fNIRS) to support real-time Brain-Computer Interface (BCI) applications, enhancing neural decoding accuracy for user control.
Optimized neural decoding algorithms to enable real-time BCI feedback, improving the responsiveness and reliability of user interactions.
Designed and implemented parallelized preprocessing pipelines for high-dimensional neurophysiological data, increasing efficiency and throughput while reducing computational overhead.
Integrated cross-platform BCI signal processing workflows, ensuring seamless interoperability between data collection platforms and visualization systems.
Implemented advanced spatiotemporal data visualization techniques, facilitating clear representation of complex neural dynamics in closed-loop neurofeedback systems.
Significantly improved neural decoding accuracy in real-time BCI applications, leading to more reliable and responsive user control.
Enhanced data processing efficiency through parallelized workflows, allowing for faster real-time analysis and reducing computational delays.

Brain-computer Interfaces

Python (Programming Language)

R (Programming Language)

Machine Learning

Data Visualization

Quantitative Research

Data Analysis

Optical Imaging

EEG

Programming

Neuroimaging

Data Engineering

Pattern Recognition

MRI

Data Science

Computer Science

View University of Illinois, Beckman Institute for Advanced Science and Technology, Cognitive Neuroimaging Lab's Website

Education

University of Illinois Urbana-Champaign

Bachelor of Science - BS, Brain and Cognitive Science • 2025

Focusing on:

Machine Learning Theory
Data Structures & Algorithms
Artificial Neural Networks
Generative Artificial Intelligence
- Mixture-of-experts
- Quantization
- End-to-end multimodal speech + vision large language models

Projects

Speakease AI

Jan 2023 - Present

A free, AI-driven platform providing real-time voice translation across 50+ languages.

Led all coding and machine learning development, creating a custom multimodal language model utilizing Reinforcement Learning from Human Feedback (RLHF) and divergence training, resulting in high-quality, context-aware translations.
Architected and maintained cloud-native infrastructure using TypeScript, Next.js, Supabase, and Azure; deployed models on Azure OpenAI service and secured cloud funding through a Microsoft partnership.
Implemented low-latency, real-time interactions by leveraging Azure's infrastructure to ensure high availability and scalability for seamless user experiences.
Managed end-to-end engineering, data science, and data engineering efforts, from model training and fine-tuning to infrastructure optimization, enabling translations adaptive to sociocultural and emotional contexts.
Achieved over 1,000+ active users, providing accessible, real-time translation support to immigrants, ESL learners, travelers, and language learners.
Pioneered innovations in AI translation using advanced machine learning techniques, creating a scalable, cloud-native solution that democratizes access to high-quality language tools.

Generative AI

Natural Language Processing (NLP)

Large Language Models (LLM)

Machine Learning

Neural Machine Translation (NMT)

Data Engineering

View Project

Ericflix: A Self-Hosted Media Streaming Platform with Advanced Features (25+ MAU)

Aug 2021 - Present

Self-hosted media streaming infrastructure with advanced features:

Content-based recommender systems using unsupervised clustering and NLP
Metadata-rich UI for enhanced discoverability
Automated content ingestion pipelines
User-driven content acquisition portal
Adaptive bitrate streaming with client-side network analysis
High-throughput GPU-accelerated transcoding
Premium audio-visual codec integration (Dolby Atmos, Vision, HDR10+)
IPTV integration
Automated polyglot subtitle and audio track retrieval

25+ Monthly Active Users, exceeding commercial streaming quality benchmarks

Data Engineering

Machine Learning

SQL

Back-End Web Development

Scripting

Automation

Front-End Development

Database Administration

DevOps

Natural Language Processing (NLP)

Advanced Autoencoder Architectures for Unsupervised Feature Learning and Dimensionality Reduction

Oct 2023 - Dec 2023

Development of unsupervised neural network models focusing on autoencoder architectures employing backpropagation algorithms for efficient data encoding and reconstruction tasks.

Implemented both single-layer and two-layer autoencoders in Python, optimizing algorithmic efficiency and scalability within high-dimensional data contexts
Conducted in-depth exploration of non-linear dimensionality reduction techniques and feature extraction methodologies, utilizing sigmoid activation functions and gradient descent optimization
Engaged in rigorous hyperparameter tuning to enhance model convergence and minimize reconstruction loss, adhering to computational standards in unsupervised learning paradigms
Leveraged Python's scientific computing libraries including NumPy for high-performance numerical operations, Scikit-learn for machine learning utilities, and Matplotlib for data visualization
Contributed to advanced neural network modeling within the context of cognitive psychology and neuroscience, aligning with the academic objectives of PSYC 489: Neural Network Modeling Lab
Underscored the applications of autoencoders in dimensionality reduction, anomaly detection, and feature learning, enriching the discourse in unsupervised machine learning and data representation

Artificial Neural Networks

Data Engineering

Machine Learning

Unsupervised Learning

Backpropagation

Dimensionality Reduction

Computational Neuroscience

View Project

Hopfield Networks for Associative Memory Modeling

Aug 2023 - Oct 2023

Construction and analysis of an advanced Hopfield Network framework to simulate associative memory and pattern retrieval in neural systems, employing recurrent neural network architectures with symmetrically weighted connections and energy minimization principles to achieve stable memory states.

Focused on accurate simulation of network dynamics based on the original formulations by John Hopfield, exploring the network's capacity for content-addressable memory and its applications in optimization problems
Implemented a class-based structure in Python for modularity and extensibility, incorporating methods for network training on multiple memory patterns, state updates with synchronous and asynchronous options, and precise energy calculations to analyze convergence behaviors
Leveraged NumPy for efficient numerical computations and Matplotlib for visualizing energy landscapes and network states
Enhanced understanding of cognitive functions and contributed to computational neuroscience studies within the context of PSYC 489: Neural Network Modeling Lab
Offered insights into memory models, pattern recognition, and the underlying mechanics of neural associative processes

Data Engineering

Machine Learning

Artificial Neural Networks

Recurrent Neural Networks (RNN)

Pattern Recognition

Python (Programming Language)

Hopfield Networks for Associative Memory Modeling

View Project

Sentiment Analysis on the IMDB Review Dataset Using Optimized Recurrent Neural Networks

Aug 2022 - Dec 2022

Sentiment classification on large-scale movie review corpus using optimized recurrent neural architectures.

Employed sequential memory modeling for temporal dependency capture in natural language
Performed hyperparameter tuning via grid and stochastic search methodologies for performance maximization
Leveraged distributed representations and transfer learning from pre-trained word embeddings
Implemented within TensorFlow/Keras ecosystem with auxiliary data manipulation via Pandas and Scikit-learn
Contributed to affective computing research in human-computer interaction and opinion mining domains

Data Engineering

Machine Learning

NLP Libraries

NLTK

Python (Programming Language)

Programming

Computer Science

Artificial Neural Networks

Data Science

Recurrent Neural Networks (RNN)

Sentiment Analysis on the IMDB Review Dataset Using Optimized Recurrent Neural Networks

View Project

A Comprehensive Analytical Framework for Multiplayer Yahtzee: From Simulation to Statistical Insights

Jan 2022 - May 2022

Stochastic simulation framework for multi-agent decision-making in constrained combinatorial environments.

Leveraged Monte Carlo methods for strategy optimization in dice-based games
Conducted statistical inference on high-dimensional gameplay metrics via hypothesis testing and multivariate regression
Performed data-driven insights extraction through exploratory data analysis and advanced visualization techniques
Implemented in Python with NumPy/Pandas for numerical computing and Matplotlib/Seaborn for graphical representation
Contributed to computational game theory and strategic decision analysis in uncertain, rule-bound domains

Data Engineering

Data Analysis

Python (Programming Language)

Regression Analysis

Programming

Computer Science

Statistics

Data Visualization

View Project

Skills & Expertise

Natural Language Processing (NLP)

DevOps

Neural Machine Translation (NMT)

Artificial Neural Networks

Generative AI

Python

TypeScript

Next.js

Azure

AWS

Machine Learning

Data Science

TensorFlow

Keras

SQL

Data Engineering

Unsupervised Learning

Recurrent Neural Networks (RNN)

Data Analysis

Regression Analysis

Programming

Statistics

Data Visualization

Computer Science

Brain-computer Interfaces

R (Programming Language)

Quantitative Research

Optical Imaging

EEG

Neuroimaging

Pattern Recognition

MRI

Eric (Yunjae) Soderquist

About Me

Experience

Founder, CEO

Data Science & Analytics Intern

Data Scientist

Software Engineer

Research Assistant, Machine Learning & Brain Computer Interfaces

Education

University of Illinois Urbana-Champaign

Projects

Speakease AI

Ericflix: A Self-Hosted Media Streaming Platform with Advanced Features (25+ MAU)

Advanced Autoencoder Architectures for Unsupervised Feature Learning and Dimensionality Reduction

Hopfield Networks for Associative Memory Modeling

Sentiment Analysis on the IMDB Review Dataset Using Optimized Recurrent Neural Networks

A Comprehensive Analytical Framework for Multiplayer Yahtzee: From Simulation to Statistical Insights

Skills & Expertise

Get in Touch