Woke up this morning to find my battery had gone offline and we'd been running "on grid" since 2am.
I tried restarting the BMS with the power button, it would do a quick red-green alternating flashes for about a second, then single red alarm light. Power cycled the whole system, still no dice. Finally power cycled the BMS again in desperation and it came back to life. As I write all seems to be operating normally.
Error log shows Bat Volt Low at just after 2am, right at the time the battery went offline. I think there was another error before that as well, but my power cycling filled the error log and lost it. Of course none of these errors are logged on FoxCloud despite the system having been offline for 4 hours
This is not the first or even the second time I've seen this issue, probably the fourth or fifth. However it is the first time I've seen it since I upgraded the BMS and battery firmware to the latest versions. Whilst this has improved their performance considerably in other areas, I'm gutted to see this problem reading it's ugly head again, since I'd previously put it down to a BMS firmware problem.
In all other respects the system performs pretty well, but occasionally (usually in the wee small hours!) it throws this error. Previously it has always come back after a simple BMS power cycle, the fact it took 20 minutes of trying this time makes me a little nervous that the problem is somehow getting worse.
For what it's worth the battery voltage trace in Home Assistant shows no abnormalities, it simply flatlines once the BMS goes into the error state.
Anyone got any ideas about what might be causing this and what I could / should do about it? In my view equipment like this needs to be utterly reliable, and it's recurrence after moving to the latest firmware starts to suggest the presence of an intermittent hardware fault?
BMS weirdness
Checking the pics on my phone it looks like this last happened in early February, when it was Bat Volt Fault, immediately followed by Bat Volt Low.
-
- Posts: 1305
- Joined: Thu Oct 13, 2022 7:21 pm
It's not obvious from that why it would throw a bat volt fault, the battery voltage for the stack looks good, translating that to cell voltages it's probably around the 3.3v point and so 60-70% soc.
I think we need to delve a little deeper to see if we can narrow things down - the message implies that one of the cells is too low which is not easy to see - can you plot your BMS cell mv high and BMS cell mv low for that same time period, and I guess for completeness we should have a look at BMS cell temp high and low - we're looking for imbalance so it's the bigger difference between the high and low which is bad.
It would be useful if you could log onto your agent account and look at the individual cell voltages when the battery is about 50% - you'll see the cell volt from 01-16 (17,18 will be blank) - that is pack No1, it then repeats from 19-34 (35,36 blank) pack No2. etc.. when you look at it you'll have individual cell voltages (say 3.25) and they should all be within a small tolerance of that. At 50% you'd probably expect the tolerance to be no more than 20mV - do any of the cell volts stand out ? - it could be a single cell that is badly balanced or a range of cells near the floor in the coldest pack.
The belt and braces attempt to correct balance is to allow the batteries to discharge to 10% and leave them like that for at least a couple of hours (the charger will cycle occasionally to maintain).
The next step is to charge your batteries from grid to 80% then reduce your max charge current to 4A and grid charge to 100%, and leave it with charging on for a couple of hours after it reaches 100%.
it will take some time to do this manually, it is actually easier just to set the low charge (4A) and leave it like that for the whole night - the most important thing is giving it time to float charge when it is at 10% and 100% - this gives the batteries time to absorb charge, and for the shunts to dissipate excess cell volts all of which helps with balance.
I think we need to delve a little deeper to see if we can narrow things down - the message implies that one of the cells is too low which is not easy to see - can you plot your BMS cell mv high and BMS cell mv low for that same time period, and I guess for completeness we should have a look at BMS cell temp high and low - we're looking for imbalance so it's the bigger difference between the high and low which is bad.
It would be useful if you could log onto your agent account and look at the individual cell voltages when the battery is about 50% - you'll see the cell volt from 01-16 (17,18 will be blank) - that is pack No1, it then repeats from 19-34 (35,36 blank) pack No2. etc.. when you look at it you'll have individual cell voltages (say 3.25) and they should all be within a small tolerance of that. At 50% you'd probably expect the tolerance to be no more than 20mV - do any of the cell volts stand out ? - it could be a single cell that is badly balanced or a range of cells near the floor in the coldest pack.
The belt and braces attempt to correct balance is to allow the batteries to discharge to 10% and leave them like that for at least a couple of hours (the charger will cycle occasionally to maintain).
The next step is to charge your batteries from grid to 80% then reduce your max charge current to 4A and grid charge to 100%, and leave it with charging on for a couple of hours after it reaches 100%.
it will take some time to do this manually, it is actually easier just to set the low charge (4A) and leave it like that for the whole night - the most important thing is giving it time to float charge when it is at 10% and 100% - this gives the batteries time to absorb charge, and for the shunts to dissipate excess cell volts all of which helps with balance.
Hi Dave
Having read your post (for which many thanks), I was kind of hoping to see something very obvious. At this scale, maybe not so much, but... It looks like a relatively wide gap compared to the rest of the trace, at least when the battery is in a quiescent state? Right at the point of the error the difference is 30-31mV (the logging isn't frequent enough to say exactly):
I'll swap the Wifi modules over later this evening when the SoC should be about right and see what I can see. Once again thanks - how the heck you would troubleshoot this as an "ordinary user" (ie without logging or visibility of the various system parameters) I have no idea. Althought I suppose you can always just do the battery balancing thing anyway and hope...
Edit: I did try to go back to look at the same trace for the February occurrance to compare, but I was using the LAN port for Modbus then so I didn't have any of the BMS data logged.
Having read your post (for which many thanks), I was kind of hoping to see something very obvious. At this scale, maybe not so much, but... It looks like a relatively wide gap compared to the rest of the trace, at least when the battery is in a quiescent state? Right at the point of the error the difference is 30-31mV (the logging isn't frequent enough to say exactly):
I checked the SoC for the same time period and it was 71%, for reference.translating that to cell voltages it's probably around the 3.3v point and so 60-70% soc.
I'll swap the Wifi modules over later this evening when the SoC should be about right and see what I can see. Once again thanks - how the heck you would troubleshoot this as an "ordinary user" (ie without logging or visibility of the various system parameters) I have no idea. Althought I suppose you can always just do the battery balancing thing anyway and hope...
Edit: I did try to go back to look at the same trace for the February occurrance to compare, but I was using the LAN port for Modbus then so I didn't have any of the BMS data logged.
I'll have another look later when the SoC is lower and hopefully the load will be minimal.
-
- Posts: 1305
- Joined: Thu Oct 13, 2022 7:21 pm
They are slightly out of balance for a pack that isn't charging, it's not a huge amount but in the scale of things 36mV can make a 10% difference in the capacity measurement.
The rest of the information you posted all looks to be fine.
I would suggest you add a sensor to track the cell imbalance, Tony has written something to do that in the sensors section so i'll go and grab it and paste it here (thanks Tony ) - that way you can easily see the imbalance at any time - normally you would expect about .15% but it goes up during charging (>3%) as it approaches 100%, the main thing is it should spend a lot of it's time below 0.5%
This is my battery 'im'-balance
I have heard a few people say after the latest firmware upgrade the BMS threw a Bat volt fault, but a slow charge sorted it out - so I think the next thing to do is to give the BMS a chance to re-balance the cells with that and then monitor your cell imbalance.
The rest of the information you posted all looks to be fine.
I would suggest you add a sensor to track the cell imbalance, Tony has written something to do that in the sensors section so i'll go and grab it and paste it here (thanks Tony ) - that way you can easily see the imbalance at any time - normally you would expect about .15% but it goes up during charging (>3%) as it approaches 100%, the main thing is it should spend a lot of it's time below 0.5%
This is my battery 'im'-balance
I have heard a few people say after the latest firmware upgrade the BMS threw a Bat volt fault, but a slow charge sorted it out - so I think the next thing to do is to give the BMS a chance to re-balance the cells with that and then monitor your cell imbalance.
-
- Posts: 1305
- Joined: Thu Oct 13, 2022 7:21 pm
this is the imbalance sensor that Tony has shared, it will make it much easier to see how things are
Code: Select all
# calculate cell imbalance as % using the difference between min and max cell voltage
- name: "battery_cell_imbalance"
unit_of_measurement: "%"
state: >
{% set cell_high = states('sensor.bms_cell_mv_high') | float(default=0) %}
{% set cell_low = states('sensor.bms_cell_mv_low') | float(default=0) %}
{% set imbalance = (cell_high-cell_low) / (cell_high + cell_low) * 200 if cell_low > 0.0 else 0.0 %}
{% set imbalance = 0.0 if cell_high < cell_low else imbalance %}
{{ imbalance | round(2) }}
Turns out it's actually really hard to get the battery bank down to Min SoC when even on a mostly cloudy day the panels are generating 10-15kWh.
-
- Posts: 1305
- Joined: Thu Oct 13, 2022 7:21 pm
Yep I know that problem well, i’ve been trying to document the battery canbus protocol and it’s not easy to get logs when there’s charge, discharge, charge again
Interestingly i’ve done a couple of things in between, I did a watt hour / % soc counter for Ryan (it’s in the sensors section) which shows quite clearly that when the batteries are charging from solar, the BMS gradually loses track of the soc and you’re waiting for it to correct as it gets to full, sometime from as low as it 85% - but it behaves differently (more predictably) when being charged from grid (I don’t understand why yet).
Its obviously using something else to measure soc rather than watts in / out, so I think it’s fair to say that it’s not a fault as you don’t lose any capacity so perhaps just one of the ism’s of LFP batteries.
The other thing is the BMS canbus protocol ‘sniffing’ i’ve been doing - in the messages i’ve found the entire pack statistics temp, voltage, soc but also the individual battery pack status messages and the soc’s can be often upto 2% different as it discharges - not too unsurprisingly it was on the colder packs that were ‘only a couple of degrees’ different.
So I guess that’s another variable the BMS has to manage, different packs have slightly different capacities and you see them diverge/converge as you charge or discharge and that’s possibly another thing that leads to soc inaccuracy.
I’m beginning to respect the BMS a lot more, I just wish it could communicate better
-
- Posts: 3
- Joined: Tue May 16, 2023 11:14 am
A poor connection between the BMS and inverter data lines can cause this fault, which does not recover until the system is power-cycled.
Check that the Ethernet cable is solidly locked in place on both ends.
Check that the Ethernet cable is solidly locked in place on both ends.