📄 Rediscovering orbital mechanics with machine learning

Quicklook:
Lemos, Pablo et al. (2023) · Machine Learning: Science and Technology
Reads: 405 · Citations: 81
DOI: 10.1088/2632-2153/acfa63

🔗 https://ui.adsabs.harvard.edu/abs/2023MLS&T...4d5002L/abstract

#Astronomy #Astrophysics #SolarPhysics #ScientificDiscovery #SymbolicRegression

Rediscovering orbital mechanics with machine learning

We present an approach for using machine learning to automatically discover the governing equations and unknown properties (in this case, masses) of real physical systems from observations. We train a 'graph neural network' to simulate the dynamics of our Solar System's Sun, planets, and large moons from 30 years of trajectory data. We then use symbolic regression to correctly infer an analytical expression for the force law implicitly learned by the neural network, which our results showed is equivalent to Newton's law of gravitation. The key assumptions our method makes are translational and rotational equivariance, and Newton's second and third laws of motion. It did not, however, require any assumptions about the masses of planets and moons or physical constants, but nonetheless, they, too, were accurately inferred with our method. Naturally, the classical law of gravitation has been known since Isaac Newton, but our results demonstrate that our method can discover unknown laws and hidden properties from observed data.

ADS

Symbolic Regression was initially developed in the 70's and reshaped in the early '90s by John Koza. Its innovation is that instead of fitting the parameters of a predefined equation (model) to the data, the algorithm creates the equation itself, from scratch.

To do so, symbolic regression in its most recent variants uses genetic algorithms to traverse a vast search space of candidate equations. This results in models that can be very complex yet interpretable, potentially uncovering complex relationships that may contribute to new scientific discoveries.

Despite the inherent limitations of efficiently searching such vast spaces, experiments have yielded very promising results in rediscovering complex, previously known mathematical equations.

GPU acceleration and more sophisticated search strategies have further expanded the capabilities of this method.

Link to the Python PySR package
https://pypi.org/project/pysr/

Wikipedia article:
https://en.wikipedia.org/wiki/Symbolic_regression

#datascience #ml #ai #python #rstats #regression #symbolicregression

Weekly Update at the Open Journal of Astrophysics – 15/03/2025

The Ideas of March are come, so it’s time for another update of papers published at the Open Journal of Astrophysics. Since the last update we have published two papers, which brings the number in Volume 8 (2025) up to 27 and the total so far published by OJAp up to 262.

The first paper to report is “Dark Energy Survey Year 6 Results: Point-Spread Function Modeling” by Theo Schutt and 59 others distributed around the world, on behalf of the DES Collaboration. It was published on Wednesday March 12th 2025 in the folder Cosmology and NonGalactic Astrophysics. It discusses the improvements made in modelling the Point Spread Function (PSF) for weak lensing measurements in the latest Dark Energy Survey (6-year) data and prospects for the future.

Here is the overlay, which you can click on to make larger if you wish:

 

You can read the officially accepted version of this paper on arXiv here.

The other paper published this week is “Exploring Symbolic Regression and Genetic Algorithms for Astronomical Object Classification” by Fabio Ricardo Llorella (Universidad Internacional de la Rioja, Spain) & José Antonio Cebrian (Universidad Laboral de Córdoba, Spain), which came out on Thursday 13th March. This one is in the folder marked Astrophysics of Galaxies and it discusses the classification of astronomical objects in the Sloan Digital Sky Survey SDSS-17 dataset using a combination of Symbolic Regressiion and Genetic Algorithms.

The overlay can be seen here:

You can find the “final” version on arXiv here.

That’s it for this week. I’ll have more papers to report next Saturday.

#arXiv250105781v2 #arXiv250309220v1 #AstronomicalObjectClassification #AstrophysicsOfGalaxies #CosmologyAndNonGalacticAstrophysics #DarkEnergySurvey #DES #DiamondOpenAccess #GeneticAlgorithms #OpenAccessPublishing #SloanDigitalSkySurvey #SymbolicRegression #TheOpenJournalOfAstrophysics #weakGravitationalLensing

The Open Journal of Astrophysics

The Open Journal of Astrophysics is an arXiv overlay journal providing open access to peer-reviewed research in astrophysics and cosmology.

What can Abzu do? Last year we published on how combining our #symbolicregression tech with Cox prop hazard model improves ability to predict death due to #heartfailure. Cox method is available on Abzu's new platform. Early access is free.

https://tinyurl.com/5jn7w8z7
#cardiovascular #AcademicMastodon #academicchatter

Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths - BMC Medical Informatics and Decision Making

Background Heart failure is a clinical syndrome characterised by a reduced ability of the heart to pump blood. Patients with heart failure have a high mortality rate, and physicians need reliable prognostic predictions to make informed decisions about the appropriate application of devices, transplantation, medications, and palliative care. In this study, we demonstrate that combining symbolic regression with the Cox proportional hazards model improves the ability to predict death due to heart failure compared to using the Cox proportional hazards model alone. Methods We used a newly invented symbolic regression method called the QLattice to analyse a data set of medical records for 299 Pakistani patients diagnosed with heart failure. The QLattice identified non-linear mathematical transformations of the available covariates, which we then used in a Cox model to predict survival. Results An exponential function of age, the inverse of ejection fraction, and the inverse of serum creatinine were identified as the best risk factors for predicting heart failure deaths. A Cox model fitted on these transformed covariates had improved predictive performance compared with a Cox model on the same covariates without mathematical transformations. Conclusion Symbolic regression is a way to find transformations of covariates from patients’ medical records which can improve the performance of survival regression models. At the same time, these simple functions are intuitive and easy to apply in clinical settings. The direct interpretability of the simple forms may help researchers gain new insights into the actual causal pathways leading to deaths.

BioMed Central
A flexible symbolic regression method for constructing interpretable clinical prediction models - Nature Digital Medicine https://www.nature.com/articles/s41746-023-00833-8 #machinelearning #bioinformatics #geneticprogramming #symbolicregression
A flexible symbolic regression method for constructing interpretable clinical prediction models - npj Digital Medicine

Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10−6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT’s models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10−6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.

Nature

Well no. GP is still the right solution:

1. Specific problems like #SymbolicRegression and #HyperHeuristics.
2. Gradual automated improvement via test cases and objective function.
3. Use cases where regurgitation / copyright infringment has to be avoided at all costs.

@nmc I guess you mean #GeneticProgramming #SymbolicRegression..? I do a lot of that too, but never saw this behaviour before. Overfitting yes, but not sudden drop in train fitness. Has it been described before? It's definitely interesting because the mechanism we saw should indeed be possible in a space of trees. Ours was in bitstrings, but a special case.

Interested in the automated discovery of physical equations from data? Check out Julius's talk on DataDrivenDiffeq: a #julialang #sciml #symbolicregression #SymbolicAI package which lets you give time series data and returns the equations which generated the data. It even generates the LaTeX!

https://www.youtube.com/watch?v=Cn5HO78Q2XA

DataDrivenDiffEq.jl- Data driven modeling in Julia | 2022 DigiWell Julia Seminar

YouTube